ADMIN - Explore the new world of system administration! Special introductory offer! Order by September 30th to save 10% off the regular subscription price! Each issue delivers technical solutions to the real-world problems you face every day. Learn the latest techniques for better:
network security
system management
troubleshooting
performance tuning
virtualization
cloud computing
on Windows, Linux, Solaris, and popular varieties of Unix.
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Watch our free Video Archive from Apachecon US 2009. Archive provided by The Apache Foundation, COLLABNET, and Linux Pro Magazine
Drawing internationally renowned thought-leaders, contributors, and organizations in the Open Source community, ApacheCon offers insight into the culture and community that develops and shepherds industry-leading Open Source projects, including Apache HTTP Server – the world's most popular Web server software for more than 10 years.
Comments