Ssscrape 1.0 Collects Dynamic Web Data

Feb 18, 2010

Mathias Huber

The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.

Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.

Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.

Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Ubuntu 26.04 Beta Arrives with Some Surprises

Games , graphics , Ubuntu

Ubuntu 26.04 is almost here, but the beta version has been released, and it might surprise some people.
Ubuntu MATE Dev Leaving After 12 years

projects , Ubuntu , Ubuntu MATE

Martin Wimpress, the maintainer of Ubuntu MATE, is now searching for his successor. Are you the next in line?
Kali Linux Waxes Nostalgic with BackTrack Mode

Kali Linux , Operating Systems , penetration tes...

For those who've used Kali Linux since its inception, the changes with the new release are sure to put a smile on your face.
Gnome 50 Smooths Out NVIDIA GPU Issues

Desktop , Games , Gnome

Gamers rejoice, your favorite pastime just got better with Gnome 50 and NVIDIA GPUs.
System76 Retools Thelio Desktop

Performance , Thelio

The new Thelio Mira has landed with improved performance, repairability, and front-facing ports alongside a high-quality tempered glass facade.
Some Linux Distros Skirt Age Verification Laws

Operating Systems , Software

After California introduced an age verification law recently, open source operating system developers have had to get creative with how they deal with it.
UN Creates Open Source Portal

Community , open source

In a quest to strengthen open source collaboration, the United Nations Office of Information and Communications Technology has created a new portal.
Latest Linux Kernel RC Contains Changes Galore

Community , Kernel

Linux kernel 7.0-rc3 includes more changes than have been made in a single release in recent history.
Nitrux 6.0 Now Ready to Rock Your World

DEBIAN , Desktop , Nitrux

The latest iteration of the Debian-based distribution includes all kinds of newness.
Linux Foundation Reports that Open Source Delivers Better ROI

Community , open source , Software

In a report that may surprise no one in the Linux community, the Linux Foundation found that businesses are finding a 5X return on investment with open source software.

Ssscrape 1.0 Collects Dynamic Web Data

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Ubuntu 26.04 Beta Arrives with Some Surprises

Ubuntu MATE Dev Leaving After 12 years

Kali Linux Waxes Nostalgic with BackTrack Mode

Gnome 50 Smooths Out NVIDIA GPU Issues

System76 Retools Thelio Desktop

Some Linux Distros Skirt Age Verification Laws

UN Creates Open Source Portal

Latest Linux Kernel RC Contains Changes Galore

Nitrux 6.0 Now Ready to Rock Your World

Linux Foundation Reports that Open Source Delivers Better ROI

Ssscrape 1.0 Collects Dynamic Web Data

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters