A Spark in the Cloud
A Spark in the Cloud
Complete large processing tasks by harnessing Amazon Web Services EC2, Apache Spark, and the Apache Zeppelin data exploration tool.
Last month I looked at how to use Apache Spark to run compute jobs on clusters of machines [1]. This month, I'm going to take that a step further by looking at how to parallelize the jobs easily and cheaply in the cloud and how to make sense of the data it produces.
Both of these tasks are somewhat interrelated, because if you're going to run your software in the cloud, it's helpful to have a good front end to control it, and this front end should provide a good way of analyzing the data.
Big Data is big business at the moment, and you have lots of options for controlling Spark running in the cloud. However, many of these choices are closed source and could lead to vendor lock-in if you start developing your code in them. To be sure you're not tied down to any one cloud provider and can always run your code on whatever hardware you like. I recommend Apache Zeppelin as a front end. Zepplin is open source, and it's supported by Amazon's Elastic Map Reduce (EMR), which means it's quick and easy to get started.
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Is AI Coming to Your Ubuntu Desktop?
According to the VP of Engineering at Canonical, AI could soon be added to the Ubuntu desktop distribution.
-
Framework Laptop 13 Pro Competes with the Best
Framework has released what might be considered the MacBook of Linux devices.
-
The Latest CachyOS Features Supercharged Kernel
The latest release of CachyOS brings with it an enhanced version of the latest Linux kernel.
-
Kernel 7.0 Is a Bit More Rusty
Linux kernel 7.0 has been released for general availability, with Rust finally getting its due.
-
France Says "Au Revoir" to Microsoft
In a move that should surprise no one, France announced plans to reduce its reliance on US technology, and Microsoft Windows is the first to get the boot.
-
CIQ Releases Compatibility Catalog for Rocky Linux
The company behind Rocky Linux is making an open catalog available to developers, hobbyists, and other contributors, so they can verify and publish compatibility with the CIQ lineup.
-
KDE Gets Some Resuscitation
KDE is bringing back two themes that vanished a few years ago, putting a bit more air under its wings.
-
Ubuntu 26.04 Beta Arrives with Some Surprises
Ubuntu 26.04 is almost here, but the beta version has been released, and it might surprise some people.
-
Ubuntu MATE Dev Leaving After 12 years
Martin Wimpress, the maintainer of Ubuntu MATE, is now searching for his successor. Are you the next in line?
-
Kali Linux Waxes Nostalgic with BackTrack Mode
For those who've used Kali Linux since its inception, the changes with the new release are sure to put a smile on your face.
