Synchronizing data with the Git-annex Assistant

On the Server

A server accessible via the Internet extends the existing setup. To do this, you again need to install Git-annex on a particular machine and use the wizard in the browser to create a new repository.

In the dashboard, the Add another repository button provides an overview of possible locations. Select the Remote server entry from the list with the description Set up a repository on a remote server using ssh. SSH with public key authentication eliminates the problem of password entry during the configuration.

After filling out the dialogs, entering the hostname, username, and directory, the software creates the necessary files and starts syncing. A final menu (Figure  2) also points out that the SSH server uses a repository belonging to the transfer group.

Figure 2: After the setup, you define the repository group.

Groups of this type keep the data long enough to allow it to be distributed to the clients. This approach avoids data ballast accumulating on the server. A transfer repository would thus even be possible on a server with limited storage space.

In addition to the transfer group, Git-annex defines nine other standard groups [4]. Depending on the application, you can assign a repository to one of these groups. The groups integrate predefined rules for distributing data. The program decides on the basis of these rules which data to keep and which to distribute to other repositories (see Table 1).

Table 1

Groups

Group

Content

client

Keeps all data except the data in the archive directory.

transfer

Transfers data to other repositories; only keeps the files until all clients have a copy.

backup

Keeps all data in the repository.

incremental backup

Keeps all data that does not exist in another backup or another repository of the same type.

small archive

Prefers data in an archive directory and data not archived elsewhere.

full archive

Contains all data not archived elsewhere.

source

A repository that produces data, but does not contain any. Removes the data as soon as it is synchronized elsewhere.

manual

Allows manual definition of rules.

public

Suitable for publishing data. A configurable directory is synced with a public repository.

unwanted

For deleting and purging a repository. This assignment ensures that the software tranfers all data out.

Remote Sharing

Assume members of the field staff want to access the existing repository on the server and synchronize data with the workgroup. They configure a local repository and the same remote server as their colleagues. In principle, the team can now exchange data via the central server, but for the time being, this is a manual process (Listing 4). For fully automated synchronization, further steps are necessary.

Listing 4

Using a Central Server

 

Automatic

The previously mentioned intermediate step for automatic synchronization relies on a tool that informs other clients about changes (Figure 3).

Figure 3: Git-annex still needs a messaging service to be able to sync data automatically.

In the current version of Git-annex, this role is assumed by an XMPP or a Jabber server, which the clients use to set up remote pairing with their friends. Alternatively, messaging services such as Google Talk are suitable.

Joey Hess, the developer of Git-annex is working on a daemon that removes the intermediate step [5]. It monitors remote databases and automatically starts the sync when needed. The daemon has already made it into the main branch of development; a stable version is thus likely to appear in the not too distant future.

If you want to manage all of the components yourself, you can install your own messaging server, such as Ejabberd. However, the packages from Debian "Wheezy" and Ubuntu "Precise" do not work with Git-annex Assistant: A bug in the authentication mechanism prevents the configuration via the interface [6].

However, installer scripts exist for Ejabberd on the project website that facilitate the manual setup. On Ubuntu, you need the libyaml-0-2 package; then, just a few commands set up your own Jabber server that cooperates with Git-annex.

The Jabber server is the final component in remote sharing. The clients in the workgroup – including that used by a field employee – only need to accept one another as chat friends. In the web interface they use the Share with a friend menu item to enter the Jabber account name.

In the following screen, the names of active friends and a button labeled Start pairing appear. This button sends a pair request, which the communication partner needs to confirm. After this final step in the configuration, the clients synchronize their data via the transfer repository.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Git-Annex

    Git-annex is storage software that distributes files across devices, servers, and cloud services. It can encrypt files and keep everything in sync, and it always knows where to find your data.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News