Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Issue 243/2021
Buy this issue as a PDF
News
-
Mozilla VPN Now Available for Linux
The promised subscription-based VPN service from Mozilla is now available for the Linux platform.
-
Wayland and New App Menu Coming to KDE
The 2021 roadmap for the KDE desktop environment includes some exciting features and improvements.
-
Deepin 20.1 has Arrived
Debian-based Deepin 20.1 has been released with some interesting new features.
-
CloudLinux Commits Over 1 Million Dollars to CentOS Replacement
An open source, drop-in replacement for CentOS is on its way.
-
Linux Mint 20.1 Beta has Been Released
The first beta of Linux Mint, Ulyssa, is now available for downloading.
-
Manjaro Linux 20.2 has Been Unleashed
The latest iteration of Manjaro Linux has been released with a few interesting new features.
-
Patreon Project Looks to Bring Linux to Apple Silicon
Developer Hector Martin has created a patreon page to fund his work on developing a port of Linux for Apple Silicon Macs.
-
A New Chrome OS-Like Ubuntu Remix is Now Available
Ubuntu Web looks to be your Chrome OS alternative.
-
System76 Refreshes the Galago Pro Laptop
Linux hardware maker has revamped one of their most popular laptops.
-
Dell Will Soon Enable Privacy Controls for Linux Hardware
Dell makes it possible for Linux users to disable webcams and microphones.