Spotlight | Reviews | Current Issue | Academy | Newsletter | Subscribe | Shop |
Departments

Partner Links
Make your own website
WinWeb OnlineOffice
Comparing prices of hardware is worth it.
Price Comparison
UK Linux Jobs
What:
Where:
Country:
vacatures Netherlands njobs Linux vacatures
arbeit Deutschland njobs Linux arbeit
work United Kingdom njobs Linux jobs
Lavoro Italia njobs Linux lavoro
Emploi France njobs Linux emploi
trabajo Espana njobs Linux trabajo

user friendly

Admin Magazine

ADMIN Network & Security

Subscribe now and save!

ADMIN - Explore the new world of system administration! Special introductory offer! Order by September 30th to save 10% off the regular subscription price! Each issue delivers technical solutions to the real-world problems you face every day. Learn the latest techniques for better:

  • network security
  • system management
  • troubleshooting
  • performance tuning
  • virtualization
  • cloud computing

 

on Windows, Linux, Solaris, and popular varieties of Unix.

http://www.admin-magazine.com/

  linux-magazine.com » Online » News » Ssscrape 1.0 Collects Dynamic Web Data  

Print this page. Recommend
Share

Ssscrape 1.0 Collects Dynamic Web Data

The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.

Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.

Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.

Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.

(Mathias Huber)

Comments


Print this page. Recommend
Share
FREE Live Streaming Video from ApacheCon US 2009

Watch our free Video Archive from Apachecon US 2009. Archive provided by The Apache Foundation, COLLABNET, and Linux Pro Magazine

Drawing internationally renowned thought-leaders, contributors, and organizations in the Open Source community, ApacheCon offers insight into the culture and community that develops and shepherds industry-leading Open Source projects, including Apache HTTP Server – the world's most popular Web server software for more than 10 years.

Find out more