Aggregating data with Portia

Itsy, Bitsy Spider

Article from Issue 169/2014

Author(s): Tim Schürmann

Are you interested in retrieving stock quotes in machine-readable form off the Internet? No problem: After a few mouse clicks, Portia weaves a command line and wraps the data in JSON format.

The Internet is a treasure trove of useful information, often residing on colorful HTML pages that are not easily extracted and processed. If you want to automate processing of current stock quotes or aggregate news, for example, you need to dismantle the HTML code of news portals such as CNN or Slashdot. This can be pretty ugly work.

Portia, a tool written in Python [1], promises a remedy; its name also refers to a genus of spiders, which would seem to make sense on the World Wide Web. The tool consists of a web application that, with a simple click, allows a user to select stock quotes, messages, and any other desired content. Portia then extracts this data and outputs it in JSON format.

Supported by a supplied web crawler, Portia can also ransack complete websites. As an example, if you need the headings from all Wikipedia articles, you show Portia exactly once where the headline resides on a Wikipedia page. The crawler then traverses the entire website and returns all matching headings in JSON format (see the "Warning" box for more information).

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Mozilla Plans to AI-ify Firefox

Artificial Inte... , Firefox , privacy

With a new CEO in control, Mozilla is doubling down on a strategy of trust, all the while leaning into AI.
Gnome Says No to AI-Generated Extensions

Artificial Inte... , Gnome , LLM

If you're a developer wanting to create a new Gnome extension, you'd best set aside that AI code generator, because the extension team will have none of that.
Parrot OS Switches to KDE Plasma Desktop

Linux , Parrot OS , Plasma

Yet another distro is making the move to the KDE Plasma desktop.
TUXEDO Announces Gemini 17

Hardware , laptop , Linux

TUXEDO Computers has released the fourth generation of its Gemini laptop with plenty of updates.
Two New Distros Adopt Enlightenment

Desktop , Enlightenment , Linux

MX Moksha and AV Linux 25 join ranks with Bodhi Linux and embrace the Enlightenment desktop.
Solus Linux 4.8 Removes Python 2

Operating Systems , Python , Solus Linux

Solus Linux 4.8 has been released with the latest Linux kernel, updated desktops, and a key removal.
Zorin OS 18 Hits over a Million Downloads

Linux , open source , Zorin OS

If you doubt Linux isn't gaining popularity, you only have to look at Zorin OS's download numbers.
TUXEDO Computers Scraps Snapdragon X1E-Based Laptop

Hardware , laptop , Linux

Due to issues with a Snapdragon CPU, TUXEDO Computers has cancelled its plans to release a laptop based on this elite hardware.
Debian Unleashes Debian Libre Live

DEBIAN , free software , Linux

Debian Libre Live keeps your machine free of proprietary software.
Valve Announces Pending Release of Steam Machine

Games , Linux , Steam

Shout it to the heavens: Steam Machine, powered by Linux, is set to arrive in 2026.

Aggregating data with Portia

Itsy, Bitsy Spider

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Mozilla Plans to AI-ify Firefox

Gnome Says No to AI-Generated Extensions

Parrot OS Switches to KDE Plasma Desktop

TUXEDO Announces Gemini 17

Two New Distros Adopt Enlightenment

Solus Linux 4.8 Removes Python 2

Zorin OS 18 Hits over a Million Downloads

TUXEDO Computers Scraps Snapdragon X1E-Based Laptop

Debian Unleashes Debian Libre Live

Valve Announces Pending Release of Steam Machine

Aggregating data with Portia

Itsy, Bitsy Spider

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters