Needle in a Haystack

Conclusions

The script presented here does not spare you the unavoidable task of manually separating the wheat from the chaff. If you run the script on a large mailbox, the result will be many files with either cryptic or very similar names (Figure 3). In the latter case, finding out which file is the true final (or initial) version requires manual examination.

Figure 3: Running the script with the most restrictive options: 99.99 percent are real attachments, without any duplicates.

Also, be prepared to fix permissions and ownership of files manually. By default, email folder and files permissions on Linux are set to 600, which means "only readable by the owner." Depending on how you configure the script, many of the files it extracts will have the same permissions, which may or may not be what you want.

Final thought: Some weird combination of character encodings and recursively embedded messages surely exists out there that would make this extraction script fail and requiring tweaking or other manual work. Unfortunately, there is nothing to be done about this scenario. However, considering that some files from just 15 or 20 years ago are already unreadable, you should be happy that you can still process all email messages ever created without particular problems. This all goes to prove that the best "innovation" is based on simple and really open standards.

The Author

Marco Fioretti (http://mfioretti.com) is a freelance author, trainer, and researcher based in Rome, Italy. He has been working with free/open source software since 1995 and on open digital standards since 2005. Marco also is a Board Member of the Free Knowledge Institute (http://freeknowledge.eu).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Charly's Column

    Charly loves to be organized, but he also likes to have access to mail that reached him when the dinosaurs were still roaming the earth.

  • Mutt for Beginners

    Mutt, a command-line email client, can do anything a desktop client can with less overhead and a smaller attack surface. Here's how to get started.

  • Archiving Email

    Email archiving involves more than just backing up your email directories. It is also a question of classifying the email and making it easier for users to find their way around overfilled email folders.

  • Hypermail

    Hypermail converts email messages to HTML and allows you to group your messages in tidy archives.

  • Perl: IMAP Chat Log

    Are you interested in storing, organizing, and searching instant messaging conversations on your IMAP server? The Perl script in this month’s column can help you do just that.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News