A Spark in the Cloud
A Spark in the Cloud
Complete large processing tasks by harnessing Amazon Web Services EC2, Apache Spark, and the Apache Zeppelin data exploration tool.
Last month I looked at how to use Apache Spark to run compute jobs on clusters of machines [1]. This month, I'm going to take that a step further by looking at how to parallelize the jobs easily and cheaply in the cloud and how to make sense of the data it produces.
Both of these tasks are somewhat interrelated, because if you're going to run your software in the cloud, it's helpful to have a good front end to control it, and this front end should provide a good way of analyzing the data.
Big Data is big business at the moment, and you have lots of options for controlling Spark running in the cloud. However, many of these choices are closed source and could lead to vendor lock-in if you start developing your code in them. To be sure you're not tied down to any one cloud provider and can always run your code on whatever hardware you like. I recommend Apache Zeppelin as a front end. Zepplin is open source, and it's supported by Amazon's Elastic Map Reduce (EMR), which means it's quick and easy to get started.
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
KDE Unleashes Plasma 6.5
The Plasma 6.5 desktop environment is now available with new features, improvements, and the usual bug fixes.
-
Xubuntu Site Possibly Hacked
It appears that the Xubuntu site was hacked and briefly served up a malicious ZIP file from its download page.
-
LMDE 7 Now Available
Linux Mint Debian Edition, version 7, has been officially released and is based on upstream Debian.
-
Linux Kernel 6.16 Reaches EOL
Linux kernel 6.16 has reached its end of life, which means you'll need to upgrade to the next stable release, Linux kernel 6.17.
-
Amazon Ditches Android for a Linux-Based OS
Amazon has migrated from Android to the Linux-based Vega OS for its Fire TV.
-
Cairo Dock 3.6 Now Available for More Compositors
If you're a fan of third-party desktop docks, then the latest release of Cairo Dock with Wayland support is for you.
-
System76 Unleashes Pop!_OS 24.04 Beta
System76's first beta of Pop!_OS 24.04 is an impressive feat.
-
Linux Kernel 6.17 is Available
Linus Torvalds has announced that the latest kernel has been released with plenty of core improvements and even more hardware support.
-
Kali Linux 2025.3 Released with New Hacking Tools
If you're a Kali Linux fan, you'll be glad to know that the third release of this famous pen-testing distribution is now available with updates for key components.
-
Zorin OS 18 Beta Available for Testing
The latest release from the team behind Zorin OS is ready for public testing, and it includes plenty of improvements to make it more powerful, user-friendly, and productive.

