An XML, HTML, and JSON data extraction tool

There are numerous ways to scrape a web page for data. In fact, the right mix of Python modules and Python logic glue could probably do the trick, but sometimes you just want a convenient tool that lets you extract data from websites. Xidel [1], a multi-platform command-line tool, offers a one-stop alternative to quickly extract, process, and save data from XML, HTML, or JSON documents.

Under the Hood

Xidel wraps XQuery, XPath, and JSON into one convenient front end. XQuery, a W3C Recommendation since 2007, lets you query XML or HTML files as if they were database servers, process the extracted data as desired, and save data to other files. As shown in the XQuery tutorial [2], XQuery-capable software can complete requests like finding all the CDs in an online catalog that cost less than $10, sorted by release date.

Xidel also fully supports the other W3C Recommendations, XPath [3] and the data-interchange language JavaScript Object Notation (JSON) [4]. XPath defines both a syntax for identifying all the elements of an XML document and a library of standard functions that make it easy to navigate through such elements and extract them. JSON data structures represent any kind of data as objects made of unordered sets of name/value pairs (I'll show some examples of this later on in this article).

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Linux Kernel 6.17 Drops bcachefs

Filesystem , Kernel , Linux

After a clash over some late fixes and disagreements between bcachefs's lead developer and Linus Torvalds, bachefs is out.
ONLYOFFICE v9 Embraces AI

Artificial Inte... , open source , OpenOffice

Like nearly all office suites on the market (except LibreOffice), ONLYOFFICE has decided to go the AI route.
Two Local Privilege Escalation Flaws Discovered in Linux

Kernel , Linux , Security

Qualys researchers have discovered two local privilege escalation vulnerabilities that allow hackers to gain root privileges on major Linux distributions.
New TUXEDO InfinityBook Pro Powered by AMD Ryzen AI 300

Hardware , Linux , Notebook

The TUXEDO InfinityBook Pro 14 Gen10 offers serious power that is ready for your business, development, or entertainment needs.
Danish Ministry of Digital Affairs Transitions to Linux

LibreOffice , Linux , Windows

Another major organization has decided to kick Microsoft Windows and Office to the curb in favor of Linux.
Linux Mint 20 Reaches EOL

With Linux Mint 20 at its end of life, the time has arrived to upgrade to Linux Mint 22.
TuxCare Announces Support for AlmaLinux 9.2

AlmaLinux , Enterprise Linux , Security

Thanks to TuxCare, AlmaLinux 9.2 (and soon version 9.6) now enjoys years of ongoing patching and compliance.
Go-Based Botnet Attacking IoT Devices

IoT , Security , Systemd

Using an SSH credential brute-force attack, the Go-based PumaBot is exploiting IoT devices everywhere.
Plasma 6.5 Promises Better Memory Optimization

Desktop , Linux , Plasma

With the stable Plasma 6.4 on the horizon, KDE has a few new tricks up its sleeve for Plasma 6.5.
KaOS 2025.05 Officially Qt5 Free

KDE , Linux , Operating Systems

If you're a fan of independent Linux distributions, the team behind KaOS is proud to announce the latest iteration that includes kernel 6.14 and KDE's Plasma 6.3.5.

An XML, HTML, and JSON data extraction tool

Under the Hood

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Linux Kernel 6.17 Drops bcachefs

ONLYOFFICE v9 Embraces AI

Two Local Privilege Escalation Flaws Discovered in Linux

New TUXEDO InfinityBook Pro Powered by AMD Ryzen AI 300

Danish Ministry of Digital Affairs Transitions to Linux

Linux Mint 20 Reaches EOL

TuxCare Announces Support for AlmaLinux 9.2

Go-Based Botnet Attacking IoT Devices

Plasma 6.5 Promises Better Memory Optimization

KaOS 2025.05 Officially Qt5 Free

An XML, HTML, and JSON data extraction tool

Under the Hood

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters