diff and merge
Repurposing Old Tools

© Lead Image © Ion Chiosea, 123RF.com
Diff and merge: They're not just for developers.
Recently, a friend of mine returned to a manuscript after several months. The manuscript had half a dozen versions, and she could no longer remember how each one differed. Listening to her problem, I had a blinding flash of the obvious: diff
[1], and related commands like diff3
[2] and merge
[3], can be as much help to her as they have been to coders over the decades.
diff
is a utility that compares two files line by line. For coders, diff
is a command that defines Unix-like operating systems like Linux. Although file comparison utilities are as old as Unix, diff
itself was first released in 1974 for text files, with support for binary files added later. diff
presents users with a summary of the comparison in two different formats, which can also be merged into a single file. diff3
[2], a similar utility, operates in a like manner on three files, although it does not support binary formats. More sophisticated tools like patch
have been developed, but diff
is still installed by default in many distributions, and its output files, or diffs, remain a standard name for any patch, just as the grep
command has given its name to any file search.
Basic Comparisons
Typing info diff
(the man page is incomplete) quickly shows how diff
can be as useful to a writer as a programmer. The command follows the standard format of a command followed by options and two files. The first file is the original, or any file if, as in my friend's case, the original is unknown or irrelevant:
diff OPTIONS ORIGINAL-FILE OTHER-FILE
Just by adding the --brief
(-q
) option, a writer can tell if the files are different – something that file attributes alone cannot always show. Similarly, --report-identical-files
(-s
) either reports when the files are the same or displays the differences (Figure 1). In some situations, like my friend's, this information alone may be enough to let some files be ignored.
Even more efficiently, directories can be specified instead of files, with --recursive
(-r
) added to include subdirectories in order to locate identical files. In the same way, the --from-file=DIRECTORY1
and --to-file=DIRECTORY2
options can be used to compare files of the same name in different directories. With --exclude=PATTERN
(-x
), files that match the pattern are excluded, while --exclude-from=FILE
(-X
) excludes files that match the patterns that are listed, one per line, in the designated file. Still other options when comparing directories are the self-explanatory --starting-file=FILE
(-S FILE
), --exclude=PATTERN
(-x PATTERN
), --ignore-file-name-case
, and --no-ignore-file-name-case
. All these options make for a more targeted search, and, although they take a while to set up, are still much faster than opening all the files for comparisons.
However, the comparison can be far more specific. Some options, such as --show-c-function
(-p
) are specific to programming, but others apply to regular text as easily as code. You can, for example, use --ignore-all-space
(-w
) so that differences in white space are not considered. Similarly, when comparing plain text files, using --ignore-blank-lines
(-B
) ignores the blank lines that are being used to separate paragraphs. A particularly useful option is --ignore-matching-lines=REGULAR-EXPRESSION
, which can help to focus results.
More specifically, experts can specify what to display with -GTYPE-group-format=
. The option can be completed to specify, in this order, lines from the original file (NUMBER <
), lines from the second file (NUMBER >
), or lines common to both (NUMBER=
). Similarly, --LTYPE-line-format=
can be completed by the first line number (F=
), last line number (L=
), and the number of lines (N=
). Both options have a number of other completions, so consult the man
or info
page for more details.
Output Formats
By default, diff
displays the lines where differences occur in a set format (Figure 2). If the files are identical, there will be no output whatsoever. However, assuming some output is produced, at the top of the display is a summary, such as 5,6c7. This summary displays the line number or lines where differences occur in the original file on the left, and the line number in the other file on the right. In between is one of three letters: c (change), a (append), or d (delete). Below the summary, the name of the original file is given first, marked by a lesser than (<) sign. Below it, the second file is marked by a greater than sign (>). For each difference, context lines are given to make the difference easier to find. The default number of context lines is three, but you change them by adding the option --context=NUMBER
(-c
or -C NUMBER
).

An even easier output display can be had by adding --side-by-side
(-y
) to the command. This option displays the original file's contents on the left and the second file on the right, making detailed comparisons easy (Figure 3). You can adjust the column widths for a side-by-side display up to a maximum of 130 characters with --width=NUMBER
(-W NUMBER
). Another option is to set --left-column
, so that only common lines are shown.
Regardless of which of these two output formats you use, the display is noticeably more flexible than that offered by LibreOffice's Edit | Track Changes, which can require far more concentration to read. If you open a second copy of the original, you can merge the files manually as you compare diff
's results. A manual comparison is laborious, but it may be the best way to compare results.
A third alternative is to to use --ifdef=NAME
(-D NAME
) to create an output merged file (Figure 4). This output can be copied and pasted into a new file, where a writer can manually merge. However, if you are confident that the two files can be merged to get the results that you want, you can use --ed
(-e
) to actually merge the file. In programming, --ed
is used to generate a patch, yet it can serve a writer's purpose just as well.
In all formats, you can further customize by adding --color
(Figure 5). Left unspecified, the --color
option will use color when standard output is to a terminal. However, you can also complete the option with =none
to never use color or =always
. By default, red is used for deleted lines, green for added lines, cyan for line numbers, and a bold font weight for the header. Colors can be customized with --palette=PALETTE
, as specified in the diff
info file.
diff3 and merge
diff
's obvious limitation is that the original file must be compared against each of the other files one at a time. A quicker method is to use diff3
or merge
to compare two files simultaneously with the original.
Like diff
, the first file listed by diff3
is the original. The default output of the two commands is also similar, although diff3
uses a back slash (\) for the original file, a lesser than sign (<) for the second, and a greater than sign (>) for the third. In addition, diff3
can add --show-all
(-A
) to output all changes, with conflicts listed in brackets. diff3
's output can also be set to show only overlaps with --show-overlap
(-E
) or non-overlaps with --easy-only
(-3v
). Other options for input include --ed
(-ed
), which diff3
shares with diff
, and --merge
(-m
), diff3
's more sensibly named version of diff
's --ifdef=NAME
(-D NAME
).
merge
is a near-duplicate of diff3
. However, instead of providing output that can be copied and pasted into a new file, merge
adds everything to the original file. This behavior is not a problem if there are no conflicts. However, if conflicts do exist, merge
warns of them, and the original file will need editing. This extra effort is not much trouble in a plain text file, but in a binary format like Open Document Format, it could potentially corrupt the original file. The same is true for -A
, which, as in diff3,
offers more verbose output. For this reason, only use merge
after making a backup of the original.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
Fedora 39 Beta is Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.
-
Star Labs Reveals a New Surface-Like Linux Tablet
If you've ever wanted a tablet that rivals the MS Surface, you're in luck as Star Labs has created such a device.
-
SUSE Going Private (Again)
The company behind SUSE Linux Enterprise, Rancher, and NeuVector recently announced that Marcel LUX III SARL (Marcel), its majority shareholder, intends to delist it from the Frankfurt Stock Exchange by way of a merger.