HPC in the cloud

Cloud Play

Article from Issue 154/2013

Author(s): Jeff Layton

Cloud computing is most definitely here, but does it have a role in HPC? We discuss changes in HPC that could be solved effectively by cloud computing.

I'm not a big Bob Dylan fan, but when it comes to HPC, "The Times They Are a-Changin'." Some of this change is going to the cloud, not because it would be really cool, but rather because HPC has existing workloads that fit well into cloud computing and can save money over traditional solutions. Perhaps more importantly, I think HPC has evolved to include non-traditional workloads and is adapting to meet those workloads – in many cases using cloud computing to do so. I will explain by giving two examples.

1. Massively Concurrent Runs

At one HPC center that I'm familiar with, users periodically submit 25,000 to 30,000 jobs as part of a parameter sweep; that is, they run the same application with 25,000 to 30,000 different data sets. Many times the application is a Matlab script, a Python script, an R script, a Perl script, or something particularly serial (i.e., it runs well on a single core). The same script is run but with thousands of different input files, resulting in the need to run thousands of jobs at the same time. Many times these applications run fairly quickly – perhaps in a couple of minutes – and many times they do not produce a great deal of data.

A closely related set of researchers study operating systems and security, during which they run different simulations with different inputs. For example, they might run 20,000 instances of an OS, primarily a kernel, and explore exploits against that OS. As with the previous set of researchers, the goal is to run a huge number of simulations as quickly as possible to find new ideas about how to protect an OS and kernel. The run times are not very long, but they must run the tests against a single OS. Consequently, they run thousands of jobs at the same time and then look through the results and continue with their research.

What is important to both sets of researchers is to have all of the jobs run at nearly the same time so they can examine the results and either focus on a small subset of the data and run more granular input data sets or try yet more data sets (broaden the search space). Additionally, these users want to broaden the search space so they can get either more detail or examine more options. The result is the need for more cores. This same HPC center has users asking for 50,000 and 100,000 cores to run their applications. The coin of the realm for these researchers is core count and not per-core performance.

Another interesting aspect of these researchers is that they don't run these massive job sets all of the time. They create the input data sets and create the job arrays and then run the job array. Once the jobs are done, however, it takes time to process the output to understand it and to determine the next step. What is important to these researchers is to have all of the results before doing this post-processing. If this doesn't happen, the researcher has to wait days for the jobs to finish before post-processing can take place.

Getting more efficiency from the hardware is not the problem because faster hardware will only improve the research time a little bit. Reducing the run time from 120 seconds to 100 seconds wouldn't really improve their research productivity. What improves their productivity is to have all of the jobs run at the same time.

I originally thought this scenario was confined to my experience with a particular HPC center, but I was wrong. I've spoken to several people, and they all have similar workload characteristics with varying sizes (several hundred to 50,000 cores). Although this might not describe your particular workload, a number of centers fit this scenario, and this number is growing rapidly.

2. Web Services

Another popular scenario in HPC centers that I've seen is the increasing need for hosting servers for classes or training, for websites (internal and external), and for other general research-related computing in which the applications are not parallel or might not even be "scientific." I heard one person refer to this as "Ash and Trash computing," probably because it refers to running non-traditional HPC workloads; however, it's becoming fairly common.

Consider an HPC center with training courses or classes that need access to a number of systems. A simple example is a class in parallel computing with 30 students. The students might need many cores per person for their course, and they wouldn't be pushing the performance of the systems; however the data center will need a number of systems for the class. If they need 20 cores per student, that's 600 cores just for a single course.

The need for dedicated web servers for research is also increasing. The websites they host go beyond the classic personal websites. Researchers want, and need, to put their research on a website that allows them to share results, interact with other researchers, and show their research. An increasing number of web-based research tools are available, such as nanoHUB [1] and Galaxy [2]. I know of one HPC center that has close to 20 Galaxy servers, each tuned to a specific research project. HPC centers are discovering that it makes much more sense to handle these non-traditional workloads themselves. The reasons are varied, but in general, HPC centers understand research better than the departments that worry about mail servers, databases, and ERP applications. These enterprise computing functions are critical to the overall center, but research and HPC require a different kind of service. Moreover, HPC centers can react much more rapidly to requests than the enterprise IT department.

Time for a Change

HPC is being asked to adapt to new roles when it comes to the needs of researchers. These needs include applications that require a tremendous number of cores but not a great deal of performance, as well as applications such as web servers, classroom and training support, and web-based applications and tools that are not traditional HPC applications. These workloads fit into the HPC world much better than they fit into the enterprise world.

These changes are everywhere. They may not be a large force, and they might not be as pervasive in your particular HPC center, but they are happening and they are growing more rapidly than traditional workloads. Consequently, I've started to refer to this new generation of computing as Research Computing. If you like, research computing is a superset of traditional HPC, or traditional HPC is a subset of research computing. I also like to think of research computing as adding components, techniques, and technology for solving problems that traditional HPC cannot or might not solve. One of these technologies is cloud computing.

1 2 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Canonical Releases Ubuntu 24.04

Gnome , Linux , open source , Ubuntu

After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
Linux Servers Targeted by Akira Ransomware

Enterprise Linux , Linux , ransomware , Security

A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.

HPC in the cloud