Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Kubuntu Focus Goes Ultra
The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.
-
Linux Gamers May Soon See Less Mouse Lag in KDE Plasma
Gamers using KDE’s Plasma desktop have been suffering from a slight input delay in mouse movement that could lead to getting fragged.
-
Three Lines of Code Improve Linux Storage Performance
A developer changed three lines of code, giving Linux storage performance a 5% bump.
-
AUR Hit Again with Malicious Packages
Once again the Arch User Repository is plagued by a high volume of malicious packages.
-
Alpine Linux 3.24 Features Fresh Desktops and a Newer Kernel
If you're a fan of Alpine Linux, it's time to upgrade because the latest version has been released with KDE Plasma 6.6, Gnome 50, and Linux kernel 6.18 LTS.
-
EU Open Source Strategy Plays Key Role in Tech Sovereignty Package
Comprehensive measures adopted by the European Commission aim to reduce dependency on non-EU countries.
-
Linux Foundation Report Indicates AI Driving Tech Hiring
Within growing security and skills gaps, AI has been found to be a positive driving force behind tech hiring trends in Europe.
-
United Nations Open Source Portal Goes Live
A new open source portal seeks to coordinate and scale open source efforts across the United Nations system.
-
KDE Linux Drops AUR
KDE Linux developers have dropped the Arch User Repository from the build pipeline due to security concerns; other distributions should consider doing the same.
-
California May Exempt Linux from Its Age-Verification Law
After backlash from the Linux community, California may be backing off on its promise to force all operating systems to verify age, but one platform may still have to comply.
