Developing a mailbot script

Address Catcher

© Lead Image © Konstantin Inozemtcev, 123RF.com

© Lead Image © Konstantin Inozemtcev, 123RF.com

Article from Issue 292/2025
Author(s):

A Python script that captures email addresses will help you understand how bots analyze and extract data from the web.

Bots crawl around constantly on the Internet, capturing information from public websites for later processing. Although the science of bot design has become quite advanced, the basic steps for capturing data from an HTML page are quite simple. This article describes an example script that extracts email addresses. The script even provides the option to extend the search to the URLs found on the target page. Rolling your own bot will help you build a deeper understanding of privacy defense and cybersecurity.

Setting Up the Environment

I recommend setting up an integrated development environment, like Visual Studio (VS) Code for Python programming, and having a basic understanding of the language. You can download VS Code from the VS Code website [1]. On Ubuntu, an easy way to install the application is by downloading the .deb package, right-clicking the file, and selecting the Install option. Alternatively, you can search for "vscode" in the App Center and click the Install button. If you prefer using the terminal, the VS Code website [2] provides detailed instructions for any Linux distribution. I also suggest adding Python development extensions, including Pylance and the Python Debugger.

The Script

The full text of the mailbot.py script is available on the Linux Magazine website [3]. Listing 1 shows the beginning of the script where I import the modules I will need to manage communications via the HTTP protocol, search for string patterns using regular expressions, implement asynchronous functions, manage script input arguments, and show a progress bar to track process advancement. The alive_progress module is not part of the standard library, so I have to install it with the following command:

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Bash Web Maintenance

    Use tools such as grep and sed to find and fix broken links.

  • Perl: Yahoo API Scripting

    Following in the footsteps of Google, Amazon, and eBay, Yahoo recently introduced a web service API to its search engine. In this month’s column, we look at three Perl scripts that can help you correct typos, view other people’s vacation pictures, and track those long lost pals from school.

  • Bash Data Gathering

    With some simple Bash commands, you can gather, parse, and filter text data into CSV files ready for your favorite statistical application.

  • Tutorials – Attachment Extraction

    If your inbox is full of email messages with important attachments, retrieving those attachments manually can be a tedious task. The script presented in this article does this task automatically and can even save the email as a plain text file.

  • Perl: Searching Git

    GitHub is not only home to the code repositories of many well-known open source projects, but it also offers a sophisticated API that opens up wonderful opportunities for snooping around.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News