Manage and share files with Git

Many Branches

© Lead Image © germina, 123RF.com

© Lead Image © germina, 123RF.com

Author(s):

Software projects often comprise several code branches, some of which exist in parallel. Git supports community code development through remote repositories and code branching.

Special Thanks: This article was made possible by support from Linux Professional Institute

Real projects usually are not linear: When many developers work on code, parallel branches are the rule. Git allows you to store your code branches in a repository (repo), and even changing the directory structure does not cause any problems.

The example from the first part of this series [1] comprises three text files located in a local repository, which is usually sufficient just to manage files. However, if you work in a team, being able to link your project to a remote repository has advantages.

The corresponding git commands you will use are clone (create and check out a project), push (transfer data to the remote repository), fetch (get data from a remote repository), and pull (get and merge data). In this context, the term "data" represents the linked or specified references and objects in the Git index.

Into the Past

Figure 1 shows the status of the project at the end of the first part of the workshop with the command

git log --oneline --decorate --graph
Figure 1: Determining the status of a project. The current branch is illustrated at the bottom.

The project contains four commits arranged on a timeline. With the exception of the first commit, each is based on its predecessor. This line is known as a branch.

The HEAD is a pointer to the version on which the current working directory is based (i.e., the end of the checked out branch). The entries named origin refer to the remote repository from which you cloned the project – in this case, the origin remote repository. The information reflects the status of the last synchronization. The names master (for branches) and origin (for the repository that is the source for the local repository) are defaults that Git assigns if you do not make any explicit specifications.

Branches

Branches allow concurrent development. Typically, the main branch contains the completed or already delivered versions; further development takes place in other branches. Ideally, each task has its own branch with a meaningful name. Once you have completed the changes in each branch, you transfer them to the main branch and test them, and the new version is ready.

Git acts as a decentralized version control system. You can create branches without a connection to the remote repository and transfer them back later if necessary. Furthermore, Git treats all branches equally. The known commands work as usual and as speedily.

The following two commands create a new branch, whose starting point is the currently checked out state, and then switch to the respective branch:

git branch mybranch
git checkout mybranch

The working directory contains the status of the last commit in the specified branch. If the branch does not yet exist, the command

git checkout -b mybranch

combines both actions.

Figure 2 shows a project with the master and mybranch branches. The master branch contains the finished versions MA and MB; development takes place in mybranch, which already has the intermediate versions ZA, ZB, and ZC.

Figure 2: Branches divide a project into units, which you can complete and then add back to the main branch.

You can switch between branches with the commands:

git checkout master
git checkout mybranch

However, it only works if there are no changes in the working directory. A working directory in this state is referred to as "clean." To add changes, use

git add -u
git commit -m ...

or reset the changes with:

git reset --hard

You should regularly check whether all files are in the Git index by running git status or cloning to a test directory.

Git offers another approach of saving the changed data with the git stash command, which occurs in a special area of the local repository. You can import the changes into the working directory at any time, regardless of the version checked out. For more details, see the corresponding man page (i.e., git stash --help) and the online book Pro Git [2].

Git creates the branches in the local repository. To include them in the remote repository, you need to create an appropriate link. The push command from the first line of Listing 1 creates the link and starts the data transfer. This is followed by both branches, master and mybranch, in hot pursuit. If several people are working on a branch or you want to make a backup copy of the branch, transfer the branch to the remote repository.

Listing 1

Push to Remote Repo

$ git push --set-upstream origin mybranch
[...]
$ git remote show origin
  HEAD branch: master
  Remote branches:
    master     tracked
    mybranch   tracked
  Local branches configured for 'git pull':
    master     merges with remote master
    mybranch   merges with remote master
  Local refs configured for 'git push':
    master     pushes master      (up to date)
    mybranch   pushes to mybranch (up to date)

The git branch -a command displays the local and remote branches. Anyone who has the appropriate access rights can check out these branches. Cloning puts all branches contained in the remote repository on your disk. If you do not specify a branch, the working directory contains the latest status of master. Branches checked out of the remote repo always have the status tracked.

Make your changes on the branch as often as you like. If you reach a good version, merge this branch with the master.

Merge

Merging merges the changes from branches. In Figure 2, mybranch is derived from the last commit of the master branch. Because no changes to master have taken place, the blue path describes the merge process performed with the commands in Listing 2.

Listing 2

Merge Example

$ git checkout master
Switched to branch 'master'
$ git merge mybranch
Updating 6466a1f..eff29ab
Fast-forward
[...]
$ git log --oneline --decorate --graph --all
* 9dd9027 (HEAD -> mybranch, origin/mybranch, origin/master, origin/HEAD) project file processed
[...]

In this case, Git shifts the HEAD pointer to the last commit in mybranch. This process is known as a fast-forward. After merging, both branches have the same status. If master changed in the meantime, Git searches for the changes (a three-way comparison) from the common starting point of both branches (in this case, MB) and tries to synchronize them.

If the changes are in different places or affect different files, everything works as described. The software checks in the new version resulting from the merge as part of the process. Unless otherwise specified, Git starts the editor, so you can enter a corresponding message.

A change in the same place is a conflict (Listing 3). Git prints the names of the corresponding files and identifies the conflicts within the files (Figure 3). Once you have resolved the problems, the commands

git add -u
git commit -m ...

move the resulting state into the Git database.

Listing 3

Merge Conflict

$ git checkout master
Switched to branch 'master'
$ git merge mybranch
Auto-merging init.c
CONFLICT (content): Merge conflict in init.c
Automatic merge failed; fix conflicts and then commit the result.
$ git status
[...]
#     both modified:   init.c
[...]
Figure 3: Merge problems: The software does not have a solution, so the only option is to examine the differences manually and then adopt the desired version.

Rebase

Rebasing is another Git approach for applying changes from one branch to another. In contrast to the merge, a rebase is about moving the starting point of a branch (at least in the scope of this article). The corresponding command is

git rebase mybranch

Figure 4 shows the basic procedure. The gray mybranch block shows the status before the rebase, the blue mybranch' reflects the status after the rebase. The

git checkout mybranch
git rebase master

commands move the starting point of mybranch from MB to MD. Both commands can be combined to:

git rebase master mybranch
Figure 4: Rebasing moves the starting point of a branch.

Thanks to the rebase, versions ZA and ZB receive the changes from MB and MC. The new versions ZA' and ZB' are created. An occasional rebase prevents the branches from drifting too far apart.

The resulting structure corresponds to that shown in Figure 2. Since the head of mybranch is based on the last commit of master, you can perform any required function testing on mybranch and then use a fast-forward merge to merge the branches in the direction of the master, which avoids having an untested master version – unless you changed something there in the meantime.

Note that manual comparisons of branches before merging or rebasing help identify potential problems. You can use either of the commands

git diff <Branch1> <Branch2>
git difftool <Hash1> <Hash2>

to perform a diff.

If conflicts occur, Git displays the corresponding files and interrupts the process. After you have resolved the conflicts,

git add -u

adds the adaptations to the index, and a subsequent

git rebase --continue

resumes the process.

The manual changes become part of the branch you are moving. The master branch remains unchanged in this case. The

git rebase --abort

command aborts the process and restores the previous state.

Rebasing changes the branch starting point, but in the history, it looks as if development in a branch took place linearly. Do not apply this technique to commits that you have already uploaded to a public repository.

From a functional point of view, a rebase cancels existing commits and creates new ones instead. For anyone who has downloaded this branch before your rebase and used it as a basis for their work, it inevitably leads to an additional, and unnecessary, merge. In turn, others uploading their changes to the public repository leads to a merge, because the new branch appears to have changed. You actually have already made these changes with the rebase.

Such actions make the path of the project confusing and complicated. The section "The Perils of Rebasing" in the Pro Git book [3] describes these problems in detail with an example.

Full Speed Astern

What if you want to correct a spelling error in a version that was finished months ago? Suppose you want to change a version (e.g., hash 4fb2717) of the project in Figure 1. The changes and extensions added in the following versions might not be part of the resulting version. No problem: You can use the hash and any tags assigned to it to identify and check out the version uniquely (Listing 4).

Listing 4

Checkout by Hash

$ git checkout 4fb2717
Note: checking out '4fb2717'.

The version is now in what is known as a detached head state. You can look around, make experimental changes, and commit them again. You can also discard all commits you make in this state without affecting any branch by performing another checkout.

If you want to create a new branch for your commits, do so (now or later) by checking out again with the -b option. In the example from Listing 5, the last line says that the working directory has the same status as version 4fb2717.

Listing 5

New Branch

$ git checkout -b mybranch
HEAD is now at 4fb2717... added hello.txt

What does this text mean, and what does the detached head mean? At the end of the day, all of this shows that the checked-out version is already archived and therefore immutable. Preventing changes to checked-in versions is one of the main tasks of a version control system. Git recommends creating a new branch and working on it. Handling a detached branch is not recommended (see box "Completely Detached").

Completely Detached

The state of a detached head, a "detached state" in Git-speak, means that the HEAD pointer does not point to a real branch. Instead, it points to a previously saved, and thus immutable, version. Git itself does not mind; it allows all actions even in this state.

Figure 5 shows a project with a detached branch (based on 4fb2717) on which a commit has occurred (cec704b). As long as you are in this area, the commit remains visible. If you switch to another branch, you can only reach the commit by specifying the hash. It is also difficult to transfer such a commit to the remote repository.

Figure 5: No bindings – a detached branch points to a previously saved version.

If you've checked in changes to the detached head despite the warnings, you can use

git branch <name> <hash>

to convert it to a normal branch. This only works, however, as long as you do not change the branch. You can get the hash with git log.

If you no longer need a branch, you can delete it with

git branch -d <branch>

… if you merged it with another branch, that is. If you want to delete without merging, use -D – note the uppercase D.

Git offers very simple branch handling, and it is quite common to create several branches a day. Some available Git server extensions, such as Gitolite, also allow you to control access to individual branches.

Modifying Directories

Before every change to a directory structure, back up the current status. The subcommands rm (remove) and mv (move) make such changes. Both work on files or folders.

For more extensive changes, it might be useful to make the changes first and then check them in: Test the new structure of the directories, update the Git index, get an overview with git status, and then check in the new status. Figure 6 shows the output of git status after some project files have been moved to subdirectories.

Figure 6: Once the changes to the directories are complete, it is best to have a quick look at the status of the working directory.

The changes are applied with the commands from Listing 6. The first command updates the Git index. In the example shown in Figure 6, three files in the working directory were deleted, so Git removes them from the index. The command in the second line includes the newly created directories, including the files they contain in the index. The last command finally registers the project.

Listing 6

Applying Changes

$ git add -u
$ git add part1 part2 archive
$ git commit -m "Structure changed"

If you get lost during the conversion, you should update the working directory and then delete from the working directory the files and directories that are not in the Git index:

git reset --hard
git clean -df

Comments

This article only applies to the use of a file interface for the remote repository, which saves the need to set up a Git server but is quite unusual in practice. For larger projects, a server is practically a must for security reasons and for easier access control. Information on how to deal with Git and its inner workings can be found in the online book Pro Git [2].

Conclusions

When it comes to managing files, Git is the ideal companion, even for the smallest projects. The few commands you need, simple branch handling, support for merging, and, above all, high performance are convincing throughout. Although Git focuses on managing text files, it is also suitable for binary files. The Git version control system has a lot more to offer, especially when dealing with branches and remote repositories.

Infos

  1. "Version Control with Git" by Roman Jordan, Linux Pro Magazine, issue 216, November 2018, pg. 34, http://www.linuxpromagazine.com/Issues/2018/216/Version-Control-with-Git
  2. Pro Git (2nd edition): https://git-scm.com/book/en/v2
  3. Rebasing: https://git-scm.com/book/en/v2/Git-Branching-Rebasing

The Author

Roman Jordan has been working with Linux for more than 20 years. His main focus lies in the kernel and in programming small embedded platforms.