HDF5 for efficient I/O

Fast Containers

© Lead Image © Kirsty Pargeter, 123RF.com

© Lead Image © Kirsty Pargeter, 123RF.com

Article from Issue 205/2017

HDF5 is a flexible, self-describing, and portable hierarchical filesystem supported by a number of languages and tools, with the ability to run processes in parallel.

Input/output operations are a very important part of many applications, sometimes involving a huge amount of data and a large number of reads and writes. Therefore, applications can use a very significant portion of their total run time to perform I/O, which becomes critical in Big Data, machine learning, and high-performance computing (HPC).

In a previous article [1], I discussed options for improving I/O performance, focusing on parallel I/O. One of the options mentioned was to use a high-level library to perform the I/O. A great example of such a library is the Hierarchical Data Format (HDF) [2], a standard library used primarily for scientific computing.

In this article, I introduce HDF5 and focus on the concepts and its strengths in performing I/O; then, I look at some simple Python and Fortran code examples, before ending with an example of parallel I/O with HDF5 and Fortran.


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • TruPax 9

    The TruPax tool specializes in encrypting small datasets to safeguard your data from prying eyes.

  • FAQ – Common crawl project

    Download the entire web to kick-start a data science empire.

  • Perl: Google Chart Instructions

    A CPAN module passes drawing instructions in object-oriented Perl to Google Chart, which draws visually attractive diagrams.

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • Synkron

    Legacy backup programs are too heavyweight for a quick backup on the fly, but Synkron helps you keep smaller datasets in sync with just a couple of mouse clicks.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95