Examining the algorithms of the diff utility


Article from Issue 76/2007

Diff finds the differences between two versions of a file. We’ll show you how diff finds changes and matches in files without affecting a system's resources.

For a user at the command line, discovering the differences between two text files is easy: a simple command, such as diff Version_1.txt Version_2.txt, is all it takes. On closer inspection, however, it turns out that diff needs a large amount of memory and some ingenious algorithms to compare files. This article investigates how diff manages to find changes and matches in multiple megabyte files without affecting a system’s resources.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Spam-Detecting Neural Network

    Build a neural network that uncovers spam websites.

  • Perl: Automating Color Correction

    If you have grown tired of manually correcting color-casted images (as described in last month's Perl column), you might appreciate a script that automates this procedure.

  • BeeDiff

    BeeDiff compares two files and quickly displays the differences in a convenient desktop GUI interface.

  • Command Line: Diffutils

    The Diffutils tool set helps you compare text files, discover and display the differences between files, and even automatically synchronize files.

  • Hash Functions

    Cryptographic hash functions help you protect your passwords, but hashing is only secure if properly understood.

comments powered by Disqus

Direct Download

Read full article as PDF:

Diff_Algorithms.pdf  (320.74 kB)


njobs Europe
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia