Distribute and manage files with git-annex
Data Warehouse
Git-annex is storage software that distributes files across devices, servers, and cloud services. It can encrypt files and keep everything in sync, and it always knows where to find your data.
If you move into a secluded cabin, you cannot expect a broadband Internet connection. When US Debian developer Joey Hess moved to such a place in 2010, he had to find a way to reconcile his work with his new habitat. He only launched his expensive dial-up connection once a week for cost reasons, but he had a whole bunch of servers with a stack of hard drives to synchronize.
Crowdfunding Success
Because of his situation, Hess began to write a synchronization tool for his own purposes that is now used as the underpinnings for the free Git version control software. This tool is now widely available under the GPL as git-annex [1]. Additionally, thanks to a crowdfunding campaign, Hess has also developed the git-annex assistant
web GUI, which makes the software more accessible for end users. See the "Installation" box for more information.
Installation
Because git-annex was written by a Debian contributor, a current package is always available in the unstable branch (Sid). For the stable Debian version or other distributions, the easiest installation method is to use the precompiled binary archives. The developer always offers the up-to-date version [2]. After unpacking, you need to add the resulting directory, such as ~/bin/git-annex.linux/
, to your PATH
.
Installing the software, which was written in Haskell, with the Cabal package tool is the reserve of experienced Haskell language programmers: My tests revealed some unmet dependencies.
On the git-annex homepage, Hess describes two target groups for the software: Bob the archivist and Alice the globetrotter. The archivist can use git-annex to manage myriad files in a single directory tree, even though the files are spread across multiple servers and even across removable hard drives. For safety reasons, git-annex can also create multiple copies of each file.
In comparison, Alice is permanently on the road with her netbook and USB hard drive and has rented a web server and cloud storage. Git-annex helps Alice find the correct files while she's on the road. All she needs is a WiFi connection in a café; she automatically encrypts the files she stores with the cloud provider.
Git Plus File-Based Storage
In both scenarios, the users benefit from the fact that Git repositories can store and compare data in a decentralized way. However, the version control system was designed for storing many small source files, not for large video files, which cause performance and space problems. The developer's trick: Git-annex only checks in the metadata, along with a symbolic link, in Git. A proprietary storage back end stores the files in the .git/annex/
directory.
Git-annex comes with command-line tools that are quickly and easily learned for technically inclined users, especially if they know Git. The following commands convert the ~/annex
directory into a git-annex repository named My PC:
cd ~/annex git init git annex init "My PC"
Before you can store files in this directory, you need to put it in direct
mode by issuing the git annex direct
command. This is the only mode in which you have access to the files [3]. By default, repositories are in indirect
mode. Then, all data is stored below the hidden .git/
directory.
If you want to add another repository to your file storage, such as a USB stick, the commands in Listing 1 will help. They create the repository on the stick and link the two repositories: Each sees the other side as a remote repository, with which it then synchronizes.
Listing 1
Add Remote
To discover how to add files and hand them over to git-annex for safe keeping, check out Listing 2. The git annex sync
command triggers the data sync.
Listing 2
Adding Files
Many Accessories
In addition to these basic operations, which are described step by step on the git-annex website [4], numerous variations and accessories are available: The repositories supported by the tool, besides local directories and removable media, include remote servers (assuming they offer SSH access), ownCloud, the Google Drive online service, Amazon S3, Box.com, and Rsync.net. For bona fide archivists, the Amazon Glacier offline store and the public Archive.org are also supported.
GnuPG handles encryption, automatically using a locally generated key by default. If desired, however, users can follow GnuPG customs and encrypt with a specific public key.
To make what sounds like a very technical piece of software more accessible to the general public, Hess wrote the git-annex assistant
. This GUI uses a local web server to provide users with a modern web interface in the browser. Additionally, the tool comes with all modern conveniences: The assistant uses the Linux kernel's inotify interface to detect changed files, for example, and to trigger a sync (Figure 1).
The GUI supports the user right from the act of creating the first repository, and it automatically switches the repository to direct
mode. A web form takes the worries out of setting up remote repositories (Figure 2).
The comprehensive documentation on the project website explains many configuration options and provides recommendations for individual use cases. For example, you can assign specific roles to the individual repositories [5]. A client
contains the files that a user accesses during work; transfer
is useful for servers or removable media that are used exclusively for exchanging data, and backup
, as you might guess, keeps everything backed up.
Despite this backup function, Hess insists that git-annex is not classic backup software – it just has backup features as a side effect. With the correct options, you can even tell git-annex to move only MP3 files of less than 100MB to your player. In a screencast [6], the developer demonstrates some of the settings in the Assistant.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.