Use mhddfs to group hard disks and directories

United

Article from Issue 183/2016
Author(s):

The multi-hard drive disk filesystem (mhddfs) combines directories or hard disks on a union filesystem to create a single, large, virtual filesystem that you can use both locally and via Samba or NFS.

Establishing a reliable system and keeping track of a continually growing collection of movies and audio files can be very time consuming. What makes matters worse is that multimedia data typically resides on various disks.

This is where mhddfs enters the game: Using a union filesystem, it groups files from different locations to create a virtual directory. The tool not only combines existing data, it also provides details about free storage space on the individual filesystems. (See the box "What Is a Union Filesystem?")

Consequently, it is no longer a problem to use small disks to store a music collection that extends over three disks. You could just as easily store rock music on one disk, classical tracks on another, and e-books on the third. What happens, however, if your rock music disk is full, but your e-book disk still has room to spare? Things start to become untidy again.

An alternative would be to create a RAID [1] array, but you would always have to compromise between keeping your data safe and using storage space; it does not appear to be a viable solution for the example in this article. The use of LVM [2] only makes sense with RAID for reasons of data safety, and again this does not help solve the problem presented here.

Fortunately, mhddfs offers precisely the functionality that most users need in this case: If you run out of space on one of the grouped disks, the data can be migrated in the background to a different disk with free space without the user even noticing. By default, mhddfs reserves 4GB on each disk for emergencies: If needed, you can use

mlimit=<Limit>

to reduce this value down to as little as 100MB at the outset.

Transparent Write Access

For the virtual array to work, mhddfs – in contrast to UnionFS; AuFS, as commonly used by Live media; or OverlayFS, which was recently added to the kernel – not only makes read access transparent, but also data writes. Whereas legacy union filesystems rely on the copy-on-write (COW) [3], here mhddfs not only writes to the top level of the filesystem, but to all underlying levels, too.

Mhddfs stores files that you add to the virtual array on the first hard disk, as long as it has sufficient space (i.e., as long as the mlimit is still upheld). After this, it checks the remaining disks in sequence to see if they have sufficient space. If none of the mlimits on the disks meet the requirements, mhddfs uses the disk with the most space.

Mhddfs always stores files atomically, avoiding the kind of file splitting that you see with LVM. This works on all popular Linux filesystems, including Samba and NFS, because both return correct information about occupied and free space on the respective filesystems. SSHFS does not meet this criterion and the mhddfs developers thus warn against integrating it.

If mhddfs notices during a write that the disk in question does not have enough space, it moves the data it has already written to another disk with more space as a background operation and continues the write action on that disk. The writing program does not notice this. In other words, you can work with the virtual filesystem as if you were working on a single large disk.

No matter where data resides or how much space is available on individual disks, you only see the complete remaining free space. If you later buy a disk with enough capacity and decide to stop using the smaller disks in the mhddfs array, or if you want to use the smaller disks elsewhere, you can simply copy the content of the virtual filesystem to the new disk and unmount the smaller disks.

Flexible Storage

Mhddfs is available from the repositories of most distributions; you can thus use your distribution's package manager for the install. If you prefer to build mhddfs yourself, you can pick up the source code from the mhddfs Subversion repository [4]. Using the tool is very easy in practice. In the following example, I use three hard disks: sda1, sdb1, and sdc1; Listing 1 shows the situation at the start.

Listing 1

Example Disks

 

You can now create a new mountpoint for the array you will be creating and assign the permissions by typing:

mkdir /mnt/media
chmod 775 /mnt/media

From now on, the FUSE filesystem, which as installed to fulfill one of mhddfs's dependencies, comes into its own with its ability to migrate kernel space functions to userspace.

You do not need to be root to use mhddfs; a normal user account is fine. The account simply needs to belong to the fuse group. You can ensure this by typing:

addgroup <User> fuse

Now create the new array (Listing 2, line 1); the -o allow_other option allows other users to create files.

Listing 2

Mount Disks to Filesystem

 

Additionally, you can specify the mlimit parameter, as mentioned before, but options really belong in /etc/fstab. Assuming that the mount works, you will see output as shown in Listing 2. All three disks are mounted; all logged-in users have access, and the limit is 4GB. The results, viewed using df -h, should look something like Listing 3.

Listing 3

New Filesystem

 

As you can see, the software has created the new filesystem; the total capacity is that of the sum total of the individual disks, and the same is true of the free space. The next task is to provide this setup automatically at boot time. To do so, add a new line to your /etc/fstab file (Listing 4).

Listing 4

Creating the Filesystem at Bootup

 

If you do experience problems, it is a good idea to use another option to define where the software creates a logfile and to define the verbosity level for mhddfs's output (Listing 5). For more details, refer to the mhddfs man page [5].

Listing 5

Mhddfs Options

 

If needed, you can add more disks to the array at any time. To do so, unmount the array, restart the software, and add the disks. Then add the mountpoint to your /etc/fstab to mount the array automatically.

If you want to stop using the program, remove the line from the /etc/fstab file and delete the mountpoint for the array. If you have a distribution that uses Systemd, you can launch mhddfs via the init system; the "Launching mhddfs with Systemd" box describes this option.

Launching mhddfs with Systemd

Mhddfs does not come with a service file for Systemd. For this reason, you need to create the file, named /etc/systemd/system/mnt-media.mount, then copy the script from Listing 6 to the file. The command

systemctl daemon-reload

then reloads the file so the service can be started with the systemctl enable mnt-virtual.mount command at boot time. You can then type

systemctl start mnt-virtual.mount

for an automatic start.

Listing 6

/etc/systemd/system/mnt-media.mount

 

On the Safe Side

The driver, which is what mhddfs is at the end of the day, focuses on a single task in the classic Unix style, and it does its job well. However, it does not offer any kind of backup in the case of failure. A disk failure in the array will therefore cause loss of data. One drawback in practice is that you do not know where the software will store a new file and thus do not know what data you stand to lose if a disk dies on you.

The only remedy is to back up the data involved. Mhddfs is often used in combination with SnapRAID [6] to add a modicum of safety. Beyond this, you can also mirror the array one-to-one. To do so, create a second mhddfs instance on your backup disk and synchronize the two instances using Rsync or a similar tool [7].

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • MergerFS

    MergerFS is a simple tool for bunching together disks, volumes, and arrays.

  • Configuring Filesystems

    Although most Linux distributions today have simple-to-use graphical interfaces for setting up and managing filesystems, knowing how to perform those tasks from the command line is a valuable skill. We’ll show you how to configure and manage filesystems with mkfs, df, du, and fsck.

  • ZFS on Linux

    License issues prevent the integration of ZFS with the Linux kernel, but Linux users can try the highly praised filesystem in userspace.

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • AuFS

    AuFS offers a painless filesystem for a thin client, and FS-Cache provides a persistent cache.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News