Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Article from Issue 202/2017

Author(s): Ben Everard

Churn through lots of data with cluster computing on Apache's Spark platform.

As a society, we're creating more data than ever before. We're monitoring everything from the planet's weather to the performance of our computers, and we're storing all this information. But how do you process all this data? On a single machine, you can get a few terabytes of disk space and a few hundred gigabytes of memory (at least, you can if your pockets are deep enough), but how do you churn through a petabyte of raw ones and zeros? Basically, you're going to need more than one computer, and you're going to look for a method of running your programs on many machines at the same time: Apache Spark [1].

Before you run off and buy a rack of servers, slow down! We're going to start by introducing Spark on a single machine. Once you've mastered the basics, you can scale up.

Spark is a data processing engine that is often used with Hadoop for managing large amounts of data in a highly distributed manner. If you move forward with Spark, you're probably going to end up with a complete Hadoop setup; however, that's also getting ahead of ourselves. We can start Spark as a standalone service on a single computer.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Mozilla Plans to AI-ify Firefox

Artificial Inte... , Firefox , privacy

With a new CEO in control, Mozilla is doubling down on a strategy of trust, all the while leaning into AI.
Gnome Says No to AI-Generated Extensions

Artificial Inte... , Gnome , LLM

If you're a developer wanting to create a new Gnome extension, you'd best set aside that AI code generator, because the extension team will have none of that.
Parrot OS Switches to KDE Plasma Desktop

Linux , Parrot OS , Plasma

Yet another distro is making the move to the KDE Plasma desktop.
TUXEDO Announces Gemini 17

Hardware , laptop , Linux

TUXEDO Computers has released the fourth generation of its Gemini laptop with plenty of updates.
Two New Distros Adopt Enlightenment

Desktop , Enlightenment , Linux

MX Moksha and AV Linux 25 join ranks with Bodhi Linux and embrace the Enlightenment desktop.
Solus Linux 4.8 Removes Python 2

Operating Systems , Python , Solus Linux

Solus Linux 4.8 has been released with the latest Linux kernel, updated desktops, and a key removal.
Zorin OS 18 Hits over a Million Downloads

Linux , open source , Zorin OS

If you doubt Linux isn't gaining popularity, you only have to look at Zorin OS's download numbers.
TUXEDO Computers Scraps Snapdragon X1E-Based Laptop

Hardware , laptop , Linux

Due to issues with a Snapdragon CPU, TUXEDO Computers has cancelled its plans to release a laptop based on this elite hardware.
Debian Unleashes Debian Libre Live

DEBIAN , free software , Linux

Debian Libre Live keeps your machine free of proprietary software.
Valve Announces Pending Release of Steam Machine

Games , Linux , Steam

Shout it to the heavens: Steam Machine, powered by Linux, is set to arrive in 2026.

Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Mozilla Plans to AI-ify Firefox

Gnome Says No to AI-Generated Extensions

Parrot OS Switches to KDE Plasma Desktop

TUXEDO Announces Gemini 17

Two New Distros Adopt Enlightenment

Solus Linux 4.8 Removes Python 2

Zorin OS 18 Hits over a Million Downloads

TUXEDO Computers Scraps Snapdragon X1E-Based Laptop

Debian Unleashes Debian Libre Live

Valve Announces Pending Release of Steam Machine

Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters