diff and merge
Repurposing Old Tools
Diff and merge: They're not just for developers.
Recently, a friend of mine returned to a manuscript after several months. The manuscript had half a dozen versions, and she could no longer remember how each one differed. Listening to her problem, I had a blinding flash of the obvious: diff
[1], and related commands like diff3
[2] and merge
[3], can be as much help to her as they have been to coders over the decades.
diff
is a utility that compares two files line by line. For coders, diff
is a command that defines Unix-like operating systems like Linux. Although file comparison utilities are as old as Unix, diff
itself was first released in 1974 for text files, with support for binary files added later. diff
presents users with a summary of the comparison in two different formats, which can also be merged into a single file. diff3
[2], a similar utility, operates in a like manner on three files, although it does not support binary formats. More sophisticated tools like patch
have been developed, but diff
is still installed by default in many distributions, and its output files, or diffs, remain a standard name for any patch, just as the grep
command has given its name to any file search.
Basic Comparisons
Typing info diff
(the man page is incomplete) quickly shows how diff
can be as useful to a writer as a programmer. The command follows the standard format of a command followed by options and two files. The first file is the original, or any file if, as in my friend's case, the original is unknown or irrelevant:
diff OPTIONS ORIGINAL-FILE OTHER-FILE
Just by adding the --brief
(-q
) option, a writer can tell if the files are different – something that file attributes alone cannot always show. Similarly, --report-identical-files
(-s
) either reports when the files are the same or displays the differences (Figure 1). In some situations, like my friend's, this information alone may be enough to let some files be ignored.
Even more efficiently, directories can be specified instead of files, with --recursive
(-r
) added to include subdirectories in order to locate identical files. In the same way, the --from-file=DIRECTORY1
and --to-file=DIRECTORY2
options can be used to compare files of the same name in different directories. With --exclude=PATTERN
(-x
), files that match the pattern are excluded, while --exclude-from=FILE
(-X
) excludes files that match the patterns that are listed, one per line, in the designated file. Still other options when comparing directories are the self-explanatory --starting-file=FILE
(-S FILE
), --exclude=PATTERN
(-x PATTERN
), --ignore-file-name-case
, and --no-ignore-file-name-case
. All these options make for a more targeted search, and, although they take a while to set up, are still much faster than opening all the files for comparisons.
However, the comparison can be far more specific. Some options, such as --show-c-function
(-p
) are specific to programming, but others apply to regular text as easily as code. You can, for example, use --ignore-all-space
(-w
) so that differences in white space are not considered. Similarly, when comparing plain text files, using --ignore-blank-lines
(-B
) ignores the blank lines that are being used to separate paragraphs. A particularly useful option is --ignore-matching-lines=REGULAR-EXPRESSION
, which can help to focus results.
More specifically, experts can specify what to display with -GTYPE-group-format=
. The option can be completed to specify, in this order, lines from the original file (NUMBER <
), lines from the second file (NUMBER >
), or lines common to both (NUMBER=
). Similarly, --LTYPE-line-format=
can be completed by the first line number (F=
), last line number (L=
), and the number of lines (N=
). Both options have a number of other completions, so consult the man
or info
page for more details.
Output Formats
By default, diff
displays the lines where differences occur in a set format (Figure 2). If the files are identical, there will be no output whatsoever. However, assuming some output is produced, at the top of the display is a summary, such as 5,6c7. This summary displays the line number or lines where differences occur in the original file on the left, and the line number in the other file on the right. In between is one of three letters: c (change), a (append), or d (delete). Below the summary, the name of the original file is given first, marked by a lesser than (<) sign. Below it, the second file is marked by a greater than sign (>). For each difference, context lines are given to make the difference easier to find. The default number of context lines is three, but you change them by adding the option --context=NUMBER
(-c
or -C NUMBER
).
An even easier output display can be had by adding --side-by-side
(-y
) to the command. This option displays the original file's contents on the left and the second file on the right, making detailed comparisons easy (Figure 3). You can adjust the column widths for a side-by-side display up to a maximum of 130 characters with --width=NUMBER
(-W NUMBER
). Another option is to set --left-column
, so that only common lines are shown.
Regardless of which of these two output formats you use, the display is noticeably more flexible than that offered by LibreOffice's Edit | Track Changes, which can require far more concentration to read. If you open a second copy of the original, you can merge the files manually as you compare diff
's results. A manual comparison is laborious, but it may be the best way to compare results.
A third alternative is to to use --ifdef=NAME
(-D NAME
) to create an output merged file (Figure 4). This output can be copied and pasted into a new file, where a writer can manually merge. However, if you are confident that the two files can be merged to get the results that you want, you can use --ed
(-e
) to actually merge the file. In programming, --ed
is used to generate a patch, yet it can serve a writer's purpose just as well.
In all formats, you can further customize by adding --color
(Figure 5). Left unspecified, the --color
option will use color when standard output is to a terminal. However, you can also complete the option with =none
to never use color or =always
. By default, red is used for deleted lines, green for added lines, cyan for line numbers, and a bold font weight for the header. Colors can be customized with --palette=PALETTE
, as specified in the diff
info file.
diff3 and merge
diff
's obvious limitation is that the original file must be compared against each of the other files one at a time. A quicker method is to use diff3
or merge
to compare two files simultaneously with the original.
Like diff
, the first file listed by diff3
is the original. The default output of the two commands is also similar, although diff3
uses a back slash (\) for the original file, a lesser than sign (<) for the second, and a greater than sign (>) for the third. In addition, diff3
can add --show-all
(-A
) to output all changes, with conflicts listed in brackets. diff3
's output can also be set to show only overlaps with --show-overlap
(-E
) or non-overlaps with --easy-only
(-3v
). Other options for input include --ed
(-ed
), which diff3
shares with diff
, and --merge
(-m
), diff3
's more sensibly named version of diff
's --ifdef=NAME
(-D NAME
).
merge
is a near-duplicate of diff3
. However, instead of providing output that can be copied and pasted into a new file, merge
adds everything to the original file. This behavior is not a problem if there are no conflicts. However, if conflicts do exist, merge
warns of them, and the original file will need editing. This extra effort is not much trouble in a plain text file, but in a binary format like Open Document Format, it could potentially corrupt the original file. The same is true for -A
, which, as in diff3,
offers more verbose output. For this reason, only use merge
after making a backup of the original.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.
-
CachyOS Adds Support for System76's COSMIC Desktop
The August 2024 release of CachyOS includes support for the COSMIC desktop as well as some important bits for video.
-
Linux Foundation Adopts OMI to Foster Ethical LLMs
The Open Model Initiative hopes to create community LLMs that rival proprietary models but avoid restrictive licensing that limits usage.
-
Ubuntu 24.10 to Include the Latest Linux Kernel
Ubuntu users have grown accustomed to their favorite distribution shipping with a kernel that's not quite as up-to-date as other distros but that changes with 24.10.
-
Plasma Desktop 6.1.4 Release Includes Improvements and Bug Fixes
The latest release from the KDE team improves the KWin window and composite managers and plenty of fixes.
-
Manjaro Team Tests Immutable Version of its Arch-Based Distribution
If you're a fan of immutable operating systems, you'll be thrilled to know that the Manjaro team is working on an immutable spin that is now available for testing.