Carving tools help you recover deleted files
Modern filesystems make forensic file recovery much more difficult. Tools like Foremost and Scalpel identify data structures and carve files from a hard disk image.
IT experts and investigators have many reasons for reconstructing deleted files. Whether an intruder has deleted a log to conceal an attack or a user has destroyed a digital photo collection with an accidental rm -rf, you might someday face the need to recover deleted data. In the past, recovery experts could easily retrieve a lost file because an earlier generation of filesystems simply deleted the directory entry. The meta information that described the physical location of the data on the disk was preserved, and tools like The Coroner's Toolkit (TCT ) and The Sleuth Kit (TSK ) could uncover the information necessary for restoring the file.
Today, many filesystems delete the full set of meta information, leaving the data blocks. Putting these pieces together correctly is called file carving – forensic experts carve the raw data off the disk and reconstruct the files from it. The more fragmented the filesystem, the harder this task become.
Many open source tools automate the carving process: The list is headed by Foremost  and its derivative Scalpel , but other tools include PhotoRec  and FTimes . PhotoRec does not support generic carving for any file type, and FTimes is so hard to use it is not worthwhile for most users.
Foremost and Scalpel are not interested in the underlying filesystem. They simply expect the data blocks of the files to reside sequentially in the image under investigation. The tools will find images in dd dumps, RAM dumps, or swap files. Carving will help to identify and reconstruct files on corrupt filesystems, in slack space, or even after installation of a new operating system, as long as the required data blocks still exist.
Of course, none of these tools can perform miracles, and they are not designed to retrieve data from physically damaged hard disks. Also, the carving process cannot access data blocks that have been overwritten.
Because carving tools do not rely on the filesystem, they need other sources of information to discover where a file starts and ends. Fortunately, many file types have known structures. The header and footer are often all that is needed to identify the file type and location. The Linux file command also uses header and footer information to identify file types.
File carvers investigate the whole hard disk, or disk image, to locate known headers and footers. They then carve out the blocks between the header and footer and store the data as a new file. Some file types do not possess unique footers. Carvers will at least guess where the file ends on the knowledge of where the next header starts. Of course, any amount of unidentified data could reside between the end of the file and the next header.
To avoid collecting unnecessary junk data, carving programs allow users to set maximum file sizes. Unfortunately, headers and footers are often short, which leads to numerous false positives.
Image formats are an exception. For example, each JPEG file starts with a byte sequence of 0xFFD8, typically followed by 0xFFE00010. File carvers are thus very good at identifying JPEG images. However, if some blocks have been overwritten, or if the file is fragmented, the tools will restore only a part of the file at best (Figure 1).
Foremost and Scalpel
Jesse Kornblum and Kris Kendall from the United States Air Force Office of Special Investigations developed Foremost in March 2001 as a tool for analyzing and recovering deleted files. The Foremost carving tool is inspired by an earlier program called CarvThis, which was created back in 1999 by Defense Computer Forensic Lab but never released to the general public. Foremost is now open source, and Nick Mikus maintains the source code after giving the program a major boost in the scope of his Master's degree.
Golden G. Richard III developed a separate program dubbed Scalpel based on Foremost 0.69. For a long time, Scalpel was regarded as an advanced tool. Some sources even claim that the Foremost developers recommend Scalpel themselves . To be more accurate, both projects are under active development. Although Scalpel was far superior to its predecessor in 2005 – with the ability to analyze images around 10 times faster – Foremost has caught up recently thanks to Nick Mikus, and it is actually superior to its derivative for some tasks.
Both Foremost and Scalpel use configuration files to specify which files to search for (Listing 1). The first column designates the file type and also specifies the file extension to add to any files the program finds. Files for which the case is relevant in the header and footer have a y in column two; this is n for all others. The next column defines the maximum file size, followed by the header byte sequence, and the footer byte sequence if it exists. The \x string introduces a byte in hexadecimal notation; the other possibilities are \s for a space and ? as a wildcard for any character. Other options can follow at the end.
01 gif y 155000000 \x47\x49\x46\x38\x37\x61 \x00\x3b 02 gif y 155000000 \x47\x49\x46\x38\x39\x61 \x00\x00\x3b 03 jpg y 20000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9 04 jpg y 20000000 \xff\xd8\xff\xe1\xff\xd9 05 jpg y 20000000 \xff\xd8 \xff\xd9
Because of its origins, Scalpel uses the same configuration file as Foremost, although the two tools work differently internally. Both tools find more or less the same files, but there are some discrepancies in file identification. Forensic experts are thus well advised to use both programs.
Versions 0.9.1 and later of Foremost use a new approach to identifying ZIP, JPEG, Office, and other formats. The formats are implemented directly in Foremost, meaning that the program does not need header and footer information in the configuration file for the identification process. Foremost enables this new detection function if you set the -t flag at the command line followed by the required file types:
foremost -T -t jpg,gif,pdf -i imagefile
Supported formats are listed in Table 1. To enable all of these built-ins, just set the -t all option. The previous command line also sets the -T option to tell Foremost to write any files it finds to a directory that uses a name with a timestamp. This makes it easier to organize the forensic investigation, in that each new run writes its results to a new directory.
The possibility of false positives means that the carver identifies a huge amount of data, so make sure you have enough free space on the target filesystem. The carving process doesn't necessarily require large amounts of copying. Virtual filesystems, such as CarvFS , are designed to access the data directly from the original image. CarvFS, which is based on FUSE (Filesystem in Userspace), only expects the carving tool to provide a table that describes which files are available at which physical locations. The CarvFS filesystem originated with the Dutch police's Open Computer Forensics Architecture (OCFA) project (see the article on OCFA in this issue), and it is intended for situations in which copying all the files to a separate location would result in huge volumes of data. In other cases, however, copying the data is more efficient than accessing it from the original image.
A typical Foremost run without built-ins is shown in Listing 2. The image for this example comes courtesy of the Digital Forensic Research Workshop (DFRWS ) challenge. DFRWS ran this competition in 2006 to test file carvers and promote their development. At the end of the competition, the organizers published a list of the files in the image.
01 Foremost version 1.5.3 by Jesse Kornblum, Kris Kendall, and Nick Mikus 02 Audit File 03 04 Foremost started at Sat Feb 9 18:36:29 2008 05 Invocation: ./foremost -v -T -i ../dfrws-2006-challenge.raw 06 Output directory: /linux-magazin/foremost/foremost-1.5.3/output_Sat_Feb__9_18_36_29_2008 07 Configuration file: /linux-magazin/foremost/foremost-1.5.3/foremost.conf 08 Processing: ../dfrws-2006-challenge.raw 09 |------------------------------------------------ 10 File: ../dfrws-2006-challenge.raw 11 Start: Sat Feb 9 18:36:29 2008 12 Length: 47 MB (49999872 bytes) 13 14 Num Name (bs=512) Size File Offset Comment 15 16 0: 00003868.jpg 280 KB 1980416 17 1: 00008285.jpg 594 KB 4241920 18 2: 00011619.jpg 199 KB 5948928 19 3: 00012222.jpg 6 MB 6257664 20 [...] 21 20: 00045015.zip 274 KB 23047680 22 21: 00007982.png 6 KB 4086865 (1408 x 1800) 23 22: 00033012.png 69 KB 16902215 (1052 x 360) 24 23: 00035391.png 19 KB 18120696 (879 x 499) 25 24: 00035431.png 72 KB 18140936 (1140 x 540) 26 *| 27 Finish: Sat Feb 9 18:36:32 2008 28 29 25 FILES EXTRACTED 30 31 jpg:= 11 32 htm:= 5 33 ole:= 2 34 zip:= 3 35 png:= 4 36 ------------------------------------------------------------------ 37 38 Foremost finished at Sat Feb 9 18:36:32 2008
Buy this article as PDF
New release marks the arrival of AMD’s unified driver strategy.
A new study by IDC charts big changes in the big hardware market.
Azure CTO says Redmond has already considered the unthinkable.
Lead developer quells rumors that the Debian version is slated for center stage.
MSBuild is now just another GitHub project as Redmond continues its path to the light.
Malware could pass data and commands between disconnected computers without leaving a trace on the network.
New rules emphasize collegiality in coding.
Upstart lands in the dust bin as a new era begins for Linux.
HP's annual Cyber Risk report offers a bleak look at the state of IT.
But what do the big numbers really mean?