Backing up a living system

Serve It up Hot

© Photo by Aliona Gumeniuk on Unsplash

© Photo by Aliona Gumeniuk on Unsplash

Author(s):

The tools and strategies you use to back up files that are not being accessed won't work when you copy data that is currently in use by a busy application. This article explains the danger of employing common Linux utilities to back up living data and examines some alternatives.

Tools to make backups of your files are plentiful, and the Internet is full of tutorials explaining how to use them. Unfortunately, most entry-level blogs and articles assume the user just wants to back up a small set of data that remains static throughout the backup process.

Such an assumption is an acceptable approximation in most cases. After all, a user who is copying a folder full of horse pictures to backup storage is unlikely to open an image editor and start modifying the files at random while they are being transferred. On the other hand, real-life scenarios often require backups to occur while the data is being modified. For instance, consider the case of a web application that is under heavy load and must continue operating during the backup.

Pitfalls of Common Tools

Traditional Unix utilities such as dd, cpio, tar, or dump are poor candidates for taking snapshots from a folder full of living data. Among these, some are worse than the others.

If a program that operates at the filesystem level tries to copy a file that is being written to, for example, it could deposit a corrupt version of the file in the backup storage (Figure 1). If the corruption affects a file whose format follows a simple structure (such as a text file), this might not be a big problem, but files with complex formats might become unusable.

Figure 1: Archiving a busy folder with tar could result in a corrupted backup. Both GNU tar and OpenBSD's tar will throw a warning if a file is modified while it is being saved, but the saved copy will most likely be corrupt or incomplete.

Utilities that operate at the block device level are especially prone to issues when used on live filesystems. A program such as dump bypasses the filesystem interfaces and reads the contents physically stored by the hard drive directly. Although this approach has a number of advantages [1], it carries a big burden. When a filesystem is in use, write operations performed on it go into a cache managed by the kernel but are not committed to disk right away. From the user's point of view, when a file is written, the operation might look instantaneous, but the file itself will exist in RAM only until the kernel writes it to storage.

As a result, the actual contents stored in the hard drive will be chaotic, potentially consisting of half-written files waiting for the kernel to make them whole sometime in the future. Trying to read the drive's contents with dump or dd might then return incomplete data and therefore generate a faulty backup.

A Solution of Compromise

Venerable Unix solutions are mature and well tested, and sysadmins have good reasons not to let them go. The fact that they are not adequate for backing up folders subjected to heavy load should not be a show stopper, right?

If a backup tool cannot work reliably with a folder under load, the obvious option is to remove the load before backing the folder up. This is certainly doable on a desktop computer: You can just refrain from modifying the contents of your pictures folder while they are being archived by tar.

For servers, this approach is more complex. A hobby personal server can certainly afford to put a service offline for backup as long as such a thing is done at a time when no users are ever connected. For example, if your personal blog that takes 15 visits a day resides in /var/www, and its associated database resides in /var/mariadb, it might be viable to have a cronjob turn off the web server and the database, call sync, back up both folders, and then restart the services. A small website could take a couple of minutes to archive, and nobody will notice if you do it while your target audience is sleeping (Listing 1).

Listing 1

Backup Script for Personal Server

01 #!/bin/bash
02
03 # Proof of Concept script tested under Devuan. Fault tolerance code
04 # excluded for the sake of brevity. Not to be used in production.
05
06 # Stop the services which use the folders we want to backup.
07
08 /etc/init.d/apache2 stop
09 /etc/init.d/mysql stop
10
11 # Instruct the Operating System to commit pending write instructions to
12 # the hard drive.
13
14 /bin/sync
15
16 # Backup using Tar and send the data over to a remote host via ssh
17 # Public key SSH authentication must be configured beforehand if this
18 # script is to be run unattended.
19
20 /bin/tar --numeric-owner -cf - /var/www /var/mariadb 2 >>
21 /dev/null | ssh debug@someuser@example.org "cat - > backup_`date -I`.tar"
22
23 # Restart services
24
25 /etc/init.d/mysql start
26 /etc/init.d/apache2 start

On the other hand, for anything resembling a production server, stopping services for backup is just not an option.

Enter the COW

A popular solution for backing up filesystems while they are under load is to use storage that supports COW (Copy-on-write).

The theory behind copy-on-write is simple. When a file is opened and modified in a classic filesystem, the filesystem typically overwrites the old file with the new version of the file. COW-capable storage takes a different approach: The new version of the file is written over to a free location of the filesystem, and the location of the old file can still be registered. The implication is, while a file is being modified, the filesystem still stores a version of the file known to be good.

This ability is groundbreaking because it simplifies taking snapshots of loaded filesystems. The storage driver can be instructed to create a snapshot at the current date. If a file is being modified as the snapshot is being taken, the old version of the file will be used instead, because the old version is known to be in a consistent state while the new version that is being written might not be.

ZFS is a popular filesystem with COW capabilities. Coming from a BSD background, I tend to consider ZFS a bit cumbersome for a small Linux server. Whereas ZFS feels truly native in FreeBSD, it comes across as an outsider in the Linux world, despite the fact it is actually gaining much traction.

On the other hand, Linux has had a native snapshot tool for quite a few years: LVM (Logical Volume Manager). As its name suggests, LVM is designed to manage logical volumes. Its claim to fame is its flexibility, because it allows administrators to add more hard drives to a computer and then use them as an extension to expand existing filesystems. An often overlooked capability of LVM is its snapshoting function.

The main drawback of using LVM is that its deployment must be planned well in advance. Let's suppose you plan to deploy a database that stores application data in /var/pictures. In order to be able to take LVM snapshots from it in the future, the filesystem I intend to mount at /var/pictures must be created in the first place. For such a purpose, a partition within a hard drive must be designated as a Physical volume within which an LVM container will exist, using pvcreate. Then I must create a Volume group within it using vgcreate (Figure 2). Finally, I have to create a Logical volume inside the Volume group using lvcreate and format it (Figure 3).

Figure 2: Basic LVM configuration. The volume group database_group exists within a physical volume (typically placed in a regular partition of the hard drive). Inside database_group, there is a volume named database_volume, which stores the actual filesystem I want to backup. When the time has come for the backup, I can create a snapshot volume called database_snapshot and dump its contents to the backup media afterwards.
Figure 3: Creating a Physical volume, then a Volume group within the Physical volume, and then a Logical volume within the Volume group.

Care must be taken to leave some free space in the Volume group to host snapshots in the future. The snapshot area need not be as large as the filesystem you intend to back up, but if you can spare the storage, it is advisable.

If one day you need to make a backup of /var/pictures, the only thing you need to do is to create a snapshot volume using a command such as:

lvcreate -L 9G -s -n database_snapshot/dev/database_group/database_volume

A snapshot volume may then be mounted as a regular filesystem with mount under a different directory when you are ready:

mkdir /var/pictures_snapshot
mount -o ro /dev/database_group/database_snapshot/var/pictures_snapshot

You may then copy the contents of the snapshot using any regular tool, such as rsync, and transfer them over to definitive backup storage. The files under /var/pictures_snapshot are immutable and can be copied over even if the contents of /var/pictures are being modified during the process.

No Silver Bullets

Snapshots don't come free of disadvantages. The main disadvantage, as previously discussed, is that using snapshots requires careful planning. If you intend to use snapshots as part of your backup strategy for a folder, you need to store the contents of that folder on snapshot-capable storage to begin with.

Another issue is performance. LVM in particular is known for taking a performance hit when it must keep multiple snapshots taken from the same filesystem [3].

The biggest problem with LVM snapshots, however, is that their life is finite. LVM snapshots track the changes done to the original filesystem (see the box "How Do LVM Snapshots Work?"). If enough changes are done to the original filesystem, the LVM snapshot will run out of space to register the changes. Modern LVM supports dynamic expansion of the snapshot volume if it detects it is running out of room [4], but hard drive space is finite and the snapshot may yet be dropped if it needs to grow past the physical storage medium's capabilities.

How Do LVM Snapshots Work?

An LVM snapshot volume does not actually contain a copy of all the files as they were in the original filesystem when the snapshot was taken.

When a snapshot is taken, the volume manager takes note of the state of the original filesystem at the time of the snapshot. When a file is modified after the snapshot is taken, the old version of the file (the one you would expect to find in a snapshot) is moved to the snapshot volume area. This way, the snapshot volume only contains copies of data that has been modified since the snapshot point.

The snapshot volume can be mounted as a regular filesystem. If the user tries to access a file through the snapshot, LVM checks if the file has been modified since the snapshot point. If not, the file is retrieved from the original filesystem. If the file has been modified, the version from the snapshot area is retrieved instead.

As a result, a snapshot volume does not need to be as large as the original volume because it only contains the changes between the snapshot and the current date. Using a small snapshot volume, however, comes with risks: If the original is modified to the point that the changes are larger than the snapshot volume, the snapshot volume will be dropped and rendered broken. It is, therefore, important to allocate enough room for the snapshot volume.

Not every filesystem is suitable for use in conjunction with LVM snapshots. Filesystems must support freezing in order to guarantee that the resulting snapshots are in a consistent state when they are taken. XFS and ext4 are known to work well, and there are more options [2].

It is worth noticing that many applications with data you might want to back up don't guarantee the consistency of their files while the application is running. For example, a busy database might work asynchronously, keeping many operations in a RAM cache and committing them to disk periodically. Creating a backup that perfectly mirrors the filesystem the database is stored in will then create a copy with inconsistent data, because many database operations may not have been committed to disk. It is important to check the documentation of the programs you are backing up in order to learn of potential pitfalls.

Conclusions

Classic Unix tools are not advisable for making backups for files and folders that are being modified during the process, because backup corruption can occur as a result. This article has covered LVM as a safer alternative for Linux, but there are other tools for the task, including ZFS and BTRFS snapshots.

Many programs, especially services intended for production, suggest their own means for creating backups (Figure 4). Databases are notorious for including their own backup utilities, and if you are indeed using a real database (such as MariaDB or PostgreSQL) you should consider using their tools instead of backup utilities that operate at filesystem or block device level.

Figure 4: Production-ready programs offer instructions for backing up their data – and even provide their own backup functionality. OpenCart, a popular e-commerce platform, has a utility for copying its database on the fly.

Infos

  1. "Is Dump Really Deprecated?" by Antonios Christofides: https://dump.sourceforge.io/isdumpdeprecated.html#canusedump
  2. Freeze support committed into the kernel for a number of filesystems: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c4be0c1dc4cdc37b175579be1460f15ac6495e9a
  3. "LVM Snapshot Performance" by John Leach: https://johnleach.co.uk/posts/2010/06/18/lvm-snapshot-performance/
  4. "[PATCH] automatic snapshot extension with dmeventd (BZ 427298)": https://listman.redhat.com/archives/lvm-devel/2010-October/msg00010.html

The Author

Rubén Llorente is a mechanical engineer who ensures that the IT security measures for a small clinic are both legally compliant and safe. In addition, he is an OpenBSD enthusiast and a weapons collector.