diff and merge
Repurposing Old Tools

© Lead Image © Ion Chiosea, 123RF.com
Diff and merge: They're not just for developers.
Recently, a friend of mine returned to a manuscript after several months. The manuscript had half a dozen versions, and she could no longer remember how each one differed. Listening to her problem, I had a blinding flash of the obvious: diff
[1], and related commands like diff3
[2] and merge
[3], can be as much help to her as they have been to coders over the decades.
diff
is a utility that compares two files line by line. For coders, diff
is a command that defines Unix-like operating systems like Linux. Although file comparison utilities are as old as Unix, diff
itself was first released in 1974 for text files, with support for binary files added later. diff
presents users with a summary of the comparison in two different formats, which can also be merged into a single file. diff3
[2], a similar utility, operates in a like manner on three files, although it does not support binary formats. More sophisticated tools like patch
have been developed, but diff
is still installed by default in many distributions, and its output files, or diffs, remain a standard name for any patch, just as the grep
command has given its name to any file search.
Basic Comparisons
Typing info diff
(the man page is incomplete) quickly shows how diff
can be as useful to a writer as a programmer. The command follows the standard format of a command followed by options and two files. The first file is the original, or any file if, as in my friend's case, the original is unknown or irrelevant:
diff OPTIONS ORIGINAL-FILE OTHER-FILE
Just by adding the --brief
(-q
) option, a writer can tell if the files are different – something that file attributes alone cannot always show. Similarly, --report-identical-files
(-s
) either reports when the files are the same or displays the differences (Figure 1). In some situations, like my friend's, this information alone may be enough to let some files be ignored.
Even more efficiently, directories can be specified instead of files, with --recursive
(-r
) added to include subdirectories in order to locate identical files. In the same way, the --from-file=DIRECTORY1
and --to-file=DIRECTORY2
options can be used to compare files of the same name in different directories. With --exclude=PATTERN
(-x
), files that match the pattern are excluded, while --exclude-from=FILE
(-X
) excludes files that match the patterns that are listed, one per line, in the designated file. Still other options when comparing directories are the self-explanatory --starting-file=FILE
(-S FILE
), --exclude=PATTERN
(-x PATTERN
), --ignore-file-name-case
, and --no-ignore-file-name-case
. All these options make for a more targeted search, and, although they take a while to set up, are still much faster than opening all the files for comparisons.
However, the comparison can be far more specific. Some options, such as --show-c-function
(-p
) are specific to programming, but others apply to regular text as easily as code. You can, for example, use --ignore-all-space
(-w
) so that differences in white space are not considered. Similarly, when comparing plain text files, using --ignore-blank-lines
(-B
) ignores the blank lines that are being used to separate paragraphs. A particularly useful option is --ignore-matching-lines=REGULAR-EXPRESSION
, which can help to focus results.
More specifically, experts can specify what to display with -GTYPE-group-format=
. The option can be completed to specify, in this order, lines from the original file (NUMBER <
), lines from the second file (NUMBER >
), or lines common to both (NUMBER=
). Similarly, --LTYPE-line-format=
can be completed by the first line number (F=
), last line number (L=
), and the number of lines (N=
). Both options have a number of other completions, so consult the man
or info
page for more details.
Output Formats
By default, diff
displays the lines where differences occur in a set format (Figure 2). If the files are identical, there will be no output whatsoever. However, assuming some output is produced, at the top of the display is a summary, such as 5,6c7. This summary displays the line number or lines where differences occur in the original file on the left, and the line number in the other file on the right. In between is one of three letters: c (change), a (append), or d (delete). Below the summary, the name of the original file is given first, marked by a lesser than (<) sign. Below it, the second file is marked by a greater than sign (>). For each difference, context lines are given to make the difference easier to find. The default number of context lines is three, but you change them by adding the option --context=NUMBER
(-c
or -C NUMBER
).

An even easier output display can be had by adding --side-by-side
(-y
) to the command. This option displays the original file's contents on the left and the second file on the right, making detailed comparisons easy (Figure 3). You can adjust the column widths for a side-by-side display up to a maximum of 130 characters with --width=NUMBER
(-W NUMBER
). Another option is to set --left-column
, so that only common lines are shown.
Regardless of which of these two output formats you use, the display is noticeably more flexible than that offered by LibreOffice's Edit | Track Changes, which can require far more concentration to read. If you open a second copy of the original, you can merge the files manually as you compare diff
's results. A manual comparison is laborious, but it may be the best way to compare results.
A third alternative is to to use --ifdef=NAME
(-D NAME
) to create an output merged file (Figure 4). This output can be copied and pasted into a new file, where a writer can manually merge. However, if you are confident that the two files can be merged to get the results that you want, you can use --ed
(-e
) to actually merge the file. In programming, --ed
is used to generate a patch, yet it can serve a writer's purpose just as well.
In all formats, you can further customize by adding --color
(Figure 5). Left unspecified, the --color
option will use color when standard output is to a terminal. However, you can also complete the option with =none
to never use color or =always
. By default, red is used for deleted lines, green for added lines, cyan for line numbers, and a bold font weight for the header. Colors can be customized with --palette=PALETTE
, as specified in the diff
info file.
diff3 and merge
diff
's obvious limitation is that the original file must be compared against each of the other files one at a time. A quicker method is to use diff3
or merge
to compare two files simultaneously with the original.
Like diff
, the first file listed by diff3
is the original. The default output of the two commands is also similar, although diff3
uses a back slash (\) for the original file, a lesser than sign (<) for the second, and a greater than sign (>) for the third. In addition, diff3
can add --show-all
(-A
) to output all changes, with conflicts listed in brackets. diff3
's output can also be set to show only overlaps with --show-overlap
(-E
) or non-overlaps with --easy-only
(-3v
). Other options for input include --ed
(-ed
), which diff3
shares with diff
, and --merge
(-m
), diff3
's more sensibly named version of diff
's --ifdef=NAME
(-D NAME
).
merge
is a near-duplicate of diff3
. However, instead of providing output that can be copied and pasted into a new file, merge
adds everything to the original file. This behavior is not a problem if there are no conflicts. However, if conflicts do exist, merge
warns of them, and the original file will need editing. This extra effort is not much trouble in a plain text file, but in a binary format like Open Document Format, it could potentially corrupt the original file. The same is true for -A
, which, as in diff3,
offers more verbose output. For this reason, only use merge
after making a backup of the original.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
Arch Linux 2023.12.01 Released with a Much-Improved Installer
If you've ever wanted to install Arch Linux, now is your time. With the latest release, the archinstall script vastly simplifies the process.
-
Zorin OS 17 Beta Available for Testing
The upcoming version of Zorin OS includes plenty of improvements to take your PC to a whole new level of user-friendliness.
-
Red Hat Migrates RHEL from Xorg to Wayland
If you've been wondering when Xorg will finally be a thing of the past, wonder no more, as Red Hat has made it clear.
-
PipeWire 1.0 Officially Released
PipeWire was created to take the place of the oft-troubled PulseAudio and has finally reached the 1.0 status as a major update with plenty of improvements and the usual bug fixes.
-
Rocky Linux 9.3 Available for Download
The latest version of the RHEL alternative is now available and brings back cloud and container images for ppc64le along with plenty of new features and fixes.
-
Ubuntu Budgie Shifts How to Tackle Wayland
Ubuntu Budgie has yet to make the switch to Wayland but with a change in approaches, they're finally on track to making it happen.
-
TUXEDO's New Ultraportable Linux Workstation Released
The TUXEDO Pulse 14 blends portability with power, thanks to the AMD Ryzen 7 7840HS CPU.
-
AlmaLinux Will No Longer Be "Just Another RHEL Clone"
With the release of AlmaLinux 9.3, the distribution will be built entirely from upstream sources.
-
elementary OS 8 Has a Big Surprise in Store
When elementary OS 8 finally arrives, it will not only be based on Ubuntu 24.04 but it will also default to Wayland for better performance and security.
-
OpenELA Releases Enterprise Linux Source Code
With Red Hat restricting the source for RHEL, it was only a matter of time before those who depended on that source struck out on their own.