Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Linux Hits an Important Milestone
If you pay attention to the news in the Linux-sphere, you've probably heard that the open source operating system recently crashed through a ceiling no one thought possible.
-
Plasma Bigscreen Returns
A developer discovered that the Plasma Bigscreen feature had been sitting untouched, so he decided to do something about it.
-
CachyOS Now Lets Users Choose Their Shell
Imagine getting the opportunity to select which shell you want during the installation of your favorite Linux distribution. That's now a thing.
-
Wayland 1.24 Released with Fixes and New Features
Wayland continues to move forward, while X11 slowly vanishes into the shadows, and the latest release includes plenty of improvements.
-
Bugs Found in sudo
Two critical flaws allow users to gain access to root privileges.
-
Fedora Continues 32-Bit Support
In a move that should come as a relief to some portions of the Linux community, Fedora will continue supporting 32-bit architecture.
-
Linux Kernel 6.17 Drops bcachefs
After a clash over some late fixes and disagreements between bcachefs's lead developer and Linus Torvalds, bachefs is out.
-
ONLYOFFICE v9 Embraces AI
Like nearly all office suites on the market (except LibreOffice), ONLYOFFICE has decided to go the AI route.
-
Two Local Privilege Escalation Flaws Discovered in Linux
Qualys researchers have discovered two local privilege escalation vulnerabilities that allow hackers to gain root privileges on major Linux distributions.
-
New TUXEDO InfinityBook Pro Powered by AMD Ryzen AI 300
The TUXEDO InfinityBook Pro 14 Gen10 offers serious power that is ready for your business, development, or entertainment needs.