Detect duplicates with fdupes
Double Trouble
The command-line fdupes tool helps you find duplicate folders and directories.
Hard disks have the unpleasant tendency of filling up faster than expected. It is not always immediately obvious why. Keeping things tidy should not be underestimated in this context. Untidy, poorly organized hard disks tend to fill up faster than well-organized ones. Because life is a mixture of order and chaos, most users probably face this problem.
The unexpectedly high utilization level of hard disks is often caused by duplicate files. The typical candidates are photos, music, or videos, which can quickly occupy several gigabytes of space and are often difficult to find. There are several graphical applications on Linux to help you detect and remove duplicates like this, and there are several more for the command line.
GUI or CLI?
Well-known tools with a graphical interface for a cleanup include FSlint and dupeGuru. In this article, I will look at fdupes for the command line [1], first released in 2000. Most distributions include the tool, which weighs in at just over 100KB, in the archives; you can install using your distribution's choice of package manager. Listing 1 shows a guide for Debian, Fedora, and Arch Linux.
Listing 1
Installing fdupes
The current 2.2.1 version from September 2022 has not made its way into all repositories [2]. If you want to compile fdupes from the source code, you can use the tarball from GitHub. After unpacking, just follow the familiar three-step process of ./configure
, make
, and make install
. As of fdupes 2.0, there are two dependencies that you may also need to resolve yourself, depending on the distribution. To do this, follow the instructions in the INSTALL
file from the unpacked archive.
After the install, you can use the tool immediately without any configuration. It identifies duplicate files in the specified directories in several steps. The file name is not important for detection as a duplicate. Instead, two files must first be the same size; given this, fdupes compares their MD5 checksums. Finally, the software performs a byte-by-byte comparison, to make sure that it is definitely the same file.
Fdupes has numerous options that let you control the search and the subsequent deduplication. Initially, you will want to familiarize yourself with the tool by running the fdupes --help
command. This will help you identify the options that suit your use case.
Test Run
For the test, I created an fdupes
directory in the Documents
directory and then created 10 text files whose content read fdupes finds and removes duplicates. Listing 2 shows you how to do this quickly.
Listing 2
Create Multiple Text Files at the Same Time
A following ls -l
confirms that the files were created. The easiest way to search for duplicates in the new directory is to use the fdupes ~/Documents/fdupes
command (Figure 1). By separating the paths with spaces, you can specify multiple directories at the same time. To search recursively in directories, you need to use the -r
option, as in fdupes -r ~/documents
(Figure 2). In this case, the tool finds my 10 text files along with some other duplicates. Use the -r
option to specify the path of subdirectories you want to include.
![](/var/linux_magazin/storage/images/issues/2023/273/fdupes/figure-1/825062-1-eng-US/Figure-1_large.png)
The -S
(--size
) options shows you the size of the hits. You can use -t
or --time
to find out when a file was last modified. -G
or --minsize=SIZE
and -L
or --maxsize=SIZE
lets you further narrow down the selection.
Be Careful When Removing
But finding is only the first part of the task; after all, we want to delete duplicates to clean up the hard disk. This is where the (--delete
) option comes in. When using -d
, always make sure that your path specification is correct – files deleted with fdupes cannot be recovered. The command
fdupes -d ~/documents/fdupes
first lists the files in a numbered list (Figure 3). Note that the number at the beginning of the line will not necessarily match the number in the file name. If you now enter numbers separated by commas, they are tagged with a plus sign and remain intact, while the software removes all of the duplicates with a minus sign.
![](/var/linux_magazin/storage/images/issues/2023/273/fdupes/figure-3/825068-1-eng-US/Figure-3_large.png)
If you make a mistake, the rg
command cancels your previous entries. Pressing Delete applies your entries. If you want to remove all duplicates except the first one displayed, use the command
fdupes -r -d -N /path
You do not need to press Delete here – the -N
(noprompt) option works without any confirmation.
Another selection option after calling fdupes with the -d
option relies on the sel
parameter. You can select all files with a specific term in the path by typing sel <term>
. To select all files whose path starts with the term, use selb <term>
. Use sele <term>
to select files whose path ends with the term. To select all files whose path corresponds exactly to the term, use the selm <term>
command. After that, you can decide which of the candidates you want to keep. Further options are described by the help
command, which displays the matching fdupes man page sections.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
![Learn More](https://www.linux-magazine.com/var/linux_magazin/storage/images/media/linux-magazine-eng-us/images/misc/learn-more/834592-1-eng-US/Learn-More_medium.png)
News
-
NVIDIA Released Driver for Upcoming NVIDIA 560 GPU for Linux
Not only has NVIDIA released the driver for its upcoming CPU series, it's the first release that defaults to using open-source GPU kernel modules.
-
OpenMandriva Lx 24.07 Released
If you’re into rolling release Linux distributions, OpenMandriva ROME has a new snapshot with a new kernel.
-
Kernel 6.10 Available for General Usage
Linus Torvalds has released the 6.10 kernel and it includes significant performance increases for Intel Core hybrid systems and more.
-
TUXEDO Computers Releases InfinityBook Pro 14 Gen9 Laptop
Sporting either AMD or Intel CPUs, the TUXEDO InfinityBook Pro 14 is an extremely compact, lightweight, sturdy powerhouse.
-
Google Extends Support for Linux Kernels Used for Android
Because the LTS Linux kernel releases are so important to Android, Google has decided to extend the support period beyond that offered by the kernel development team.
-
Linux Mint 22 Stable Delayed
If you're anxious about getting your hands on the stable release of Linux Mint 22, it looks as if you're going to have to wait a bit longer.
-
Nitrux 3.5.1 Available for Install
The latest version of the immutable, systemd-free distribution includes an updated kernel and NVIDIA driver.
-
Debian 12.6 Released with Plenty of Bug Fixes and Updates
The sixth update to Debian "Bookworm" is all about security mitigations and making adjustments for some "serious problems."
-
Canonical Offers 12-Year LTS for Open Source Docker Images
Canonical is expanding its LTS offering to reach beyond the DEB packages with a new distro-less Docker image.
-
Plasma Desktop 6.1 Released with Several Enhancements
If you're a fan of Plasma Desktop, you should be excited about this new point release.