Recovering deleted files with Scalpel
The Knife

© Lead Image © Akhilesh Sharma, 123RF.com
The Scalpel file carver helps users restore what they thought were lost files.
You just need to delete the pesky backup files for the project, and then you're off for home. However, rm *~
can quickly be mistyped as rm * ~
, thus deleting all the files from the current directory. But, perhaps all is not lost: Deleted data usually is not dumped directly into a black hole. The operating system typically only deletes the metadata, such as file name, owner, and location. The user data is kept on the storage medium until it is overwritten.
Linux has a number of file carvers, which are programs designed for restoring such data. These tools analyze a disk for byte patterns that match the file headers and footers and interpret everything between the two as belonging to the file. This approach works as long as the header and footer are clear, the file is not fragmented, and the file was not encrypted.
When a footer is missing or not recognized, the carver just writes everything to the recovery file until it encounters the next header. Therefore, besides fragmented files and those with poorly discernible ends, those that contain other files – such as text documents with embedded graphics – also cause problems. If you use a file carver, you should not expect miracles but just hope for the best.
File Scalpel
The Scalpel [1] file carver can detect many different file types. It does not matter which filesystem the disk has been formatted with: Scalpel uses a database with headers and footers for various file types to trace files.
Many distributions have older versions of Scalpel in their repositories that do the job well but do not have the full functionality of the current version 2.0, such as regular expressions for headers and footers, multithreading, asynchronous input and output, or GPU-accelerated file carving (if NVidia's CUDA SDK is installed). If you want to use these features, you must build Scalpel from the source code (see the "Installation" box).
Installation
To install Scalpel 2.0, download the source code archive [2] and unpack it into a directory. Then, change to the directory and compile the tool with ./configure
followed by make
and make install
(running the last of these as root). If the binaries do not end up in /usr/local
, you need to pass in a matching --prefix
to the configure
step. Before you delete the source files, you should copy scalpel.conf
to another location: This configuration file contains the headers and footers for the supported file types.
In former times, file carvers scanned disks for header and footer patterns and wrote all the results to a new medium, which required plenty of storage. Scalpel, however, just checks a disk twice to put together all the necessary information.
The first time you run Scalpel, it looks for headers and stores its findings in a database; then, it identifies the footers. In doing so, Scalpel always takes into account that a header is always followed by a footer, which nicely accelerates the search. Now you have an index with the positions of the headers and footers, which forms the basis for the second run. This time, Scalpel matches the headers and footers and writes the files it found directly to a new location from memory without having to access the disk again.
Before Scalpel embarks on a search for lost data, it reads the scalpel.conf
configuration file, which can contain the minimum and maximum sizes of the files, including headers and footers, in addition to file types to search for. Specifying the file type results in file bloat if the footer is missing. Before you start carving, you should make some individual settings that restrict the search to a minimum number of file types and sizes from the outset.
Saving the Sandman
Next, I'll give some examples of bailouts. Scenario 1 involves a household without TV with at least one young child. The father has accidentally deleted the child's favorite "Sandmann" episodes, which were recorded from ARD Mediathek over several days. Deprived of the show, the child expresses disappointment in the usual loud and unmistakable way.
The scalpel.conf
file does not have an entry for the MP4 format, but the existing – long-since viewed – files show an encouraging consistency in terms of the headers (Figure 1). Dad now feeds the data to scalpel.conf
. The first item in the new entry (Figure 2) is the file extension that should receive potential matches. The "y" indicates whether Scalpel distinguishes between uppercase and lowercase in the header and footer. This is followed by the minimum and maximum file sizes – the MP4s usually occupy between 30 and 70MB. Finally, the header is given. A footer cannot be specified, because it always turns out differently.


Then, dad starts the rescue operation at the command line with:
$ scalpel -c scalpel.conf -o sandmann_recovered /dev/sdd1
During this operation, Scalpel really does scratch six files back off the disk. Because of the lack of footer information, they are all exactly 70,000,000 bytes long and contain a lost "Sandmann" sequence – with a more or less large chunk of junk data at the end (Figure 3). Cheers for dad!

Save my LaTeX
Scenario 2: The USB stick with your humanities term paper in TeX has died because of a physical defect in the boot sector. However, you took the business of writing this paper seriously and tagged each file at the beginning and end with %filename.tex
and %filename.tex End
. This practice hugely increases your chances of seeing the contents again.
The recovery procedure is similar to that described above. In scalpel.conf
, you can comment out all the lines except the one that describes the headers and footers of your well-documented paper:
tex y 300:50000 /%.{1,20}\.tex/ /%.{1,20}\.tex\sEnd/
This regular expression [3] tells Scalpel to search for data fragments that start with %.{1,20}\.tex
and end with %.{1,20}\.tex\sEnd
. The term .{1,20}
stands for at least 1 and a maximum of 20 characters. The dot that follows stands for characters before the tex
suffix, and the \s
in the footer means a space. Although regular expressions support the quantifiers *
, +
, ?
, and expressions like [:alnum:]
, experience shows that Scalpel cannot do much with them.
Next, to copy the contents of the broken USB stick to the stick.dd.img
file, use dd
and run the following command:
$ scalpel -c scalpel.conf -o lost_texfiles stick.dd.img
In this test, Scalpel needed just four seconds to scan a 2GB image file and restore all the TeX files.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
OpenMandriva Lx 23.03 Rolling Release is Now Available
OpenMandriva "ROME" is the latest point update for the rolling release Linux distribution and offers the latest updates for a number of important applications and tools.
-
CarbonOS: A New Linux Distro with a Focus on User Experience
CarbonOS is a brand new, built-from-scratch Linux distribution that uses the Gnome desktop and has a special feature that makes it appealing to all types of users.
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.