Rsync for website backup in a shared hosting environment

Back End Backup

Author(s):

Shared hosting is the best way for first-time webmasters to get started. But what do you do about backup?

Shared hosting remains the go-to choice for many first-time webmasters. The shared-hosting model, which allows several websites to share the same centrally managed server, lets the customer focus on web matters without getting involved with the details of the underlying OS. But the simplicity of the shared hosting environment also leads to some complications. For instance, although many shared hosts do allow users to connect to the shared server over SSH, hosting vendors typically don't provide root access for shared-hosting customers.

From a backup perspective, this lack of root access makes life a little difficult. Although there is a vibrant market of third-party vendors and managed service providers (MSPs) offering various types of cloud-to-cloud backup and data extraction solutions, many tools are proprietary and are not designed to allow easy replication to an on-site source, such as a user's desktop.

Also, although many popular applications like WordPress have their own backup and recovery plugins, these will obviously not be helpful if your website is not running WordPress – or if it's not running a content management service (CMS) at all.

Try the DIY Approach

Linux users are generally not adverse to trying out commands in the Linux terminal or getting under the hood to see what makes their systems tick. For them, backing up files and databases with a utility like rsync [1] is often a preferable option to hoping that a third-party solution will provide the functionalities that they are looking for to back up their data.

Rsync is one of the best known command-line utilities for backup and recovery. First released in the mid '90s, knowledge of rsync is a fundamental job requirement for many sysadmins and backup administrators. The tool supports transferring and synchronization to a variety of networked remotes. More recently, rclone [2] has extended its functionality to support backing up data to cloud storage repositories.

Using rsync, local backups can be taken onto a Linux-running local machine, like a desktop, or a network-attached storage (NAS), and then synced back off-site to comply with the 3-2-1 backup rule, which stipulates that data should be mirrored to one off-site repository and copied twice with each on different storage media. Rclone (rsync-like syncing between remote sources and targets) is fine for the latter purpose.

Given that, in a typical shared hosting environment, users will be caged from accessing any parts of the filesystem other than their user directory (/home/foo), a slightly more creative approach has to be employed than would be the case when backing up from a virtual private server (VPS) or dedicated machine. But it's one that's readily achievable nonetheless.

Here's what I've set up.

1. Authenticate Local Machine with Host

Because rsync runs over the SSH protocol, the first step is to make sure that the machine from which you plan on backing up your shared hosting environment's filesystem has been authenticated with the server.

This is done in the usual manner. Generate an SSH key set if you don't already have one on the machine. Shared web-hosting environments typically include access to the cPanel web-hosting control panel for ease of administration. This has an SSH functionality, where public keys can be imported and authenticated. Generate the keys and import onto the host.

2. Connect over SSH and Run an Rsync Pull

Before adding this to a Bash script and setting it to run on cron it needs to be QA'd and tested. First, SSH into your shared hosting environment to make sure that it works. Next, you'll need to pull from source to destination using rsync. To do this, pay attention to the order of the syntax (it should be source and then destination). Make sure that the hosting environment is your source and the local filesystem your destination. Mixing up source and destination can have catastrophic effects and may cause you to wish that you had thought about backups sooner!

If your host insists that you connect over SSH with a non-standard port in order to improve security (some now do), then you'll need to pass that with a 'ssh -p 12345' – obviously replacing 12345 with the port number they've asked you to connect over SSH with. Otherwise you can just connect as usual over port 22 – simply modify the syntax of the command.

rsync -arvz -e 'ssh -p 12345' yourhost@123.456.789.71:/home/youruser/ /backups/hosting/website1

Of course, you'll want to replace yourhost with your web-hosting username and replace the example IP with the actual public IP of the shared server to which you need to connect.

Now let's break down that command a little. rsync calls the rsync utility. Then come the parameters, which are entered together and prefixed by a minus symbol:

  • a runs rsync in archive mode. This recursively copies across files. It also keeps all file permissions, ownerships, and symbolic links intact.
  • r runs rsync recursively.
  • v is a verbosity parameter. Three vs, in fact, can be daisy-chained to produce the most verbose output possible. This is useful for debugging the command (when used in conjunction with the dry run parameter).
  • z compresses the file data as it is sent to destination.

Next we have the 'ssh -p XXXXX'. I provided a non-standard SSH port here, but if you're with a shared host, then yours is more likely 21. After that, I provided my SSH username and the IP address of my hosting server. After adding a colon (:), I then provided the path that I want to back up recursively. At the end of the command comes my destination.

The beauty of rsync is that it is a block-level, change-syncing algorithm. Only the files that have been changed since the last time the command ran will be moved. Additionally, only the parts of those files that have been changed will be synced. This minimizes data transmission and maximizes the efficiency of the command. Rsync has been integrated into many backup programs where it can be used to power the full range of conventional backup options (full, incremental, and differential).

If you're just trying to back up the files in, say, a WordPress installation, then I recommend simply backing up from the WordPress root in order to avoid capturing the unnecessary file clutter that you'll typically find at the user root level in a shared hosting environment.

For instance back up from :/home/youruser/public_html/wp to the target. Next, run the command and then verify that the directory has successfully built on your local machine.

3. Grab the MySQL Databases

Unfortunately, because of the limitations of shared hosting environments, the filesystem you just pulled in will not contain the MySQL database that is an integral component of many dynamic web services including WordPress. There are two ways around this.

One is to set up a cron job on the hosting server in order to run a database backup tool that then drops the output somewhere where you can access it with the rsync job. The best tool for this is mysqldump [3], which is the main utility for taking backups of MySQL databases. mysqldump can be used for both the backup and restoration of MySQL. You might wish to configure something like this, for instance:

mysqldump -u myuser -p'password' database.msql /home/backup/sql/mydatabase.sql

Alternatively, some CMSs come with a built-in or add-on tool for backing up the database. For instance, WordPress users can install the WP Database Backup plugin to periodically generate and then save a copy of the MySQL database in a table you can access with rsync. You'll have to manually enable local backups as the plugin, by default, expects that you'll be backing up directly to a remote on the cloud. To prevent the accumulation of large amounts of storage that will count against your storage limit, I also limited the amount of MySQL backups that can be stored to two.

4. Script and Automate

Now that you have tested out the two rsync pulls required, it's time to put your commands into a Bash script and set it up in cron to run automatically. The beauty of rsync is that, unlike full backups, it only moves in the changes to the filesystem between runs. That means less unnecessary data transfer – which is especially important if you're mirroring that backup from your local to an off-site location such as a public cloud.

I built two directories within a backup folder that I created to host the backups on my NAS (which, in turn, mirrors the copies up to another cloud). I called these /files/ and /db/ and put the filesystem and MySQL backups into these subfolders respectively. You can set up the job to run however often you want.

To build a master backup script you can either run a script that calls the other scripts, or you can just call all the commands from one script. For this demonstration, I have chosen the latter approach (Listing 1).

Listing 1

Backup Script

#!/bin/bash
rsync -arvz -e 'ssh -p 12345' yourhost@123.456.789.71:/home/youruser/public_html /backups/hosting/website1/files
rsync -arvz -e 'ssh -p 12345' yourhost@123.456.789.71:/home/youruser/backups/mysql/ /backups/hosting/website1/mysql
Exit

If you are backing up several websites, you can add the sites sequentially. You can increase the verbosity by adding up to three vs. If you need to troubleshoot the script from a monitored terminal, then I recommend choosing this approach.

Conclusion

After you have the basic backup pull working, then you may wish to create weekly and monthly snapshots as well as daily ones. To achieve this, you can create /weekly and /monthly folders and then run rsync jobs between the daily and monthly snapshots to those (doing this locally saves time). Just make sure that the cron jobs are set to run at the appropriate interval.

As a final note: this backup still will not capture everything in cPanel, although it might be useful to roll back changes to, say, a WordPress theme. If you want to do the former, then cPanel has a native backup tool that creates full backups of the hosting environment. It's a good idea to do both (and evaluate what MSPs and third-party tools can achieve).

You can use the technique described in this article to create local backup copies of your shared hosting environment onto a Linux host using only rsync, a few cron jobs, and, if preferred, a WordPress plugin.

The Author

Daniel Rosehill is a technology writer and reviewer specializing in thought leadership for technology clients, especially those in the B2B world. His technology interests include data and backup recovery, Linux and open source, and cloud computing. To learn more, visit http://dsrghostwriting.com.