diff and merge
Repurposing Old Tools
Diff and merge: They're not just for developers.
Recently, a friend of mine returned to a manuscript after several months. The manuscript had half a dozen versions, and she could no longer remember how each one differed. Listening to her problem, I had a blinding flash of the obvious: diff
[1], and related commands like diff3
[2] and merge
[3], can be as much help to her as they have been to coders over the decades.
diff
is a utility that compares two files line by line. For coders, diff
is a command that defines Unix-like operating systems like Linux. Although file comparison utilities are as old as Unix, diff
itself was first released in 1974 for text files, with support for binary files added later. diff
presents users with a summary of the comparison in two different formats, which can also be merged into a single file. diff3
[2], a similar utility, operates in a like manner on three files, although it does not support binary formats. More sophisticated tools like patch
have been developed, but diff
is still installed by default in many distributions, and its output files, or diffs, remain a standard name for any patch, just as the grep
command has given its name to any file search.
Basic Comparisons
Typing info diff
(the man page is incomplete) quickly shows how diff
can be as useful to a writer as a programmer. The command follows the standard format of a command followed by options and two files. The first file is the original, or any file if, as in my friend's case, the original is unknown or irrelevant:
diff OPTIONS ORIGINAL-FILE OTHER-FILE
Just by adding the --brief
(-q
) option, a writer can tell if the files are different – something that file attributes alone cannot always show. Similarly, --report-identical-files
(-s
) either reports when the files are the same or displays the differences (Figure 1). In some situations, like my friend's, this information alone may be enough to let some files be ignored.
Even more efficiently, directories can be specified instead of files, with --recursive
(-r
) added to include subdirectories in order to locate identical files. In the same way, the --from-file=DIRECTORY1
and --to-file=DIRECTORY2
options can be used to compare files of the same name in different directories. With --exclude=PATTERN
(-x
), files that match the pattern are excluded, while --exclude-from=FILE
(-X
) excludes files that match the patterns that are listed, one per line, in the designated file. Still other options when comparing directories are the self-explanatory --starting-file=FILE
(-S FILE
), --exclude=PATTERN
(-x PATTERN
), --ignore-file-name-case
, and --no-ignore-file-name-case
. All these options make for a more targeted search, and, although they take a while to set up, are still much faster than opening all the files for comparisons.
However, the comparison can be far more specific. Some options, such as --show-c-function
(-p
) are specific to programming, but others apply to regular text as easily as code. You can, for example, use --ignore-all-space
(-w
) so that differences in white space are not considered. Similarly, when comparing plain text files, using --ignore-blank-lines
(-B
) ignores the blank lines that are being used to separate paragraphs. A particularly useful option is --ignore-matching-lines=REGULAR-EXPRESSION
, which can help to focus results.
More specifically, experts can specify what to display with -GTYPE-group-format=
. The option can be completed to specify, in this order, lines from the original file (NUMBER <
), lines from the second file (NUMBER >
), or lines common to both (NUMBER=
). Similarly, --LTYPE-line-format=
can be completed by the first line number (F=
), last line number (L=
), and the number of lines (N=
). Both options have a number of other completions, so consult the man
or info
page for more details.
Output Formats
By default, diff
displays the lines where differences occur in a set format (Figure 2). If the files are identical, there will be no output whatsoever. However, assuming some output is produced, at the top of the display is a summary, such as 5,6c7. This summary displays the line number or lines where differences occur in the original file on the left, and the line number in the other file on the right. In between is one of three letters: c (change), a (append), or d (delete). Below the summary, the name of the original file is given first, marked by a lesser than (<) sign. Below it, the second file is marked by a greater than sign (>). For each difference, context lines are given to make the difference easier to find. The default number of context lines is three, but you change them by adding the option --context=NUMBER
(-c
or -C NUMBER
).
An even easier output display can be had by adding --side-by-side
(-y
) to the command. This option displays the original file's contents on the left and the second file on the right, making detailed comparisons easy (Figure 3). You can adjust the column widths for a side-by-side display up to a maximum of 130 characters with --width=NUMBER
(-W NUMBER
). Another option is to set --left-column
, so that only common lines are shown.
Regardless of which of these two output formats you use, the display is noticeably more flexible than that offered by LibreOffice's Edit | Track Changes, which can require far more concentration to read. If you open a second copy of the original, you can merge the files manually as you compare diff
's results. A manual comparison is laborious, but it may be the best way to compare results.
A third alternative is to to use --ifdef=NAME
(-D NAME
) to create an output merged file (Figure 4). This output can be copied and pasted into a new file, where a writer can manually merge. However, if you are confident that the two files can be merged to get the results that you want, you can use --ed
(-e
) to actually merge the file. In programming, --ed
is used to generate a patch, yet it can serve a writer's purpose just as well.
In all formats, you can further customize by adding --color
(Figure 5). Left unspecified, the --color
option will use color when standard output is to a terminal. However, you can also complete the option with =none
to never use color or =always
. By default, red is used for deleted lines, green for added lines, cyan for line numbers, and a bold font weight for the header. Colors can be customized with --palette=PALETTE
, as specified in the diff
info file.
diff3 and merge
diff
's obvious limitation is that the original file must be compared against each of the other files one at a time. A quicker method is to use diff3
or merge
to compare two files simultaneously with the original.
Like diff
, the first file listed by diff3
is the original. The default output of the two commands is also similar, although diff3
uses a back slash (\) for the original file, a lesser than sign (<) for the second, and a greater than sign (>) for the third. In addition, diff3
can add --show-all
(-A
) to output all changes, with conflicts listed in brackets. diff3
's output can also be set to show only overlaps with --show-overlap
(-E
) or non-overlaps with --easy-only
(-3v
). Other options for input include --ed
(-ed
), which diff3
shares with diff
, and --merge
(-m
), diff3
's more sensibly named version of diff
's --ifdef=NAME
(-D NAME
).
merge
is a near-duplicate of diff3
. However, instead of providing output that can be copied and pasted into a new file, merge
adds everything to the original file. This behavior is not a problem if there are no conflicts. However, if conflicts do exist, merge
warns of them, and the original file will need editing. This extra effort is not much trouble in a plain text file, but in a binary format like Open Document Format, it could potentially corrupt the original file. The same is true for -A
, which, as in diff3,
offers more verbose output. For this reason, only use merge
after making a backup of the original.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.
-
DebConf24 to be Held in South Korea
Busan will be the location of the latest DebConf running July 28 through August 4
-
Fedora Unleashes Atomic Desktops
Fedora has combined its solid distribution with rpm-ostree system to make it possible to deliver a new family of Fedora spins, called Fedora Atomic Desktops.
-
Bootloader Vulnerability Affects Nearly All Linux Distributions
The developers of shim have released a version to fix numerous security flaws, including one that could enable remote control execution of malicious code under certain circumstances.