Developing a mailbot script

Address Catcher

© Lead Image © Konstantin Inozemtcev, 123RF.com

© Lead Image © Konstantin Inozemtcev, 123RF.com

Article from Issue 292/2025
Author(s):

A Python script that captures email addresses will help you understand how bots analyze and extract data from the web.

Bots crawl around constantly on the Internet, capturing information from public websites for later processing. Although the science of bot design has become quite advanced, the basic steps for capturing data from an HTML page are quite simple. This article describes an example script that extracts email addresses. The script even provides the option to extend the search to the URLs found on the target page. Rolling your own bot will help you build a deeper understanding of privacy defense and cybersecurity.

Setting Up the Environment

I recommend setting up an integrated development environment, like Visual Studio (VS) Code for Python programming, and having a basic understanding of the language. You can download VS Code from the VS Code website [1]. On Ubuntu, an easy way to install the application is by downloading the .deb package, right-clicking the file, and selecting the Install option. Alternatively, you can search for "vscode" in the App Center and click the Install button. If you prefer using the terminal, the VS Code website [2] provides detailed instructions for any Linux distribution. I also suggest adding Python development extensions, including Pylance and the Python Debugger.

The Script

The full text of the mailbot.py script is available on the Linux Magazine website [3]. Listing 1 shows the beginning of the script where I import the modules I will need to manage communications via the HTTP protocol, search for string patterns using regular expressions, implement asynchronous functions, manage script input arguments, and show a progress bar to track process advancement. The alive_progress module is not part of the standard library, so I have to install it with the following command:

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Perl: Yahoo API Scripting

    Following in the footsteps of Google, Amazon, and eBay, Yahoo recently introduced a web service API to its search engine. In this month’s column, we look at three Perl scripts that can help you correct typos, view other people’s vacation pictures, and track those long lost pals from school.

  • Bash Web Maintenance

    Use tools such as grep and sed to find and fix broken links.

  • Bash Tuning

    In the old days, shells were capable of little more than calling external programs and executing basic, internal commands. With all the bells and whistles in the latest versions of Bash, however, you hardly need the support of external tools.

  • Tutorials – Attachment Extraction

    If your inbox is full of email messages with important attachments, retrieving those attachments manually can be a tedious task. The script presented in this article does this task automatically and can even save the email as a plain text file.

  • Pick Up

    In this month's column, Mike Schilli writes a special mail client in Go and delves into the depths of the IMAP protocol in order to archive photos from incoming emails.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News