An XML, HTML, and JSON data extraction tool

Easy Extraction

© Lead Image © Wutthichai Luemuang, 123RF.com

Article from Issue 276/2023

Author(s): Marco Fioretti

Xidel lets you easily extract and process data from XML, HTML, and JSON documents.

There are numerous ways to scrape a web page for data. In fact, the right mix of Python modules and Python logic glue could probably do the trick, but sometimes you just want a convenient tool that lets you extract data from websites. Xidel [1], a multi-platform command-line tool, offers a one-stop alternative to quickly extract, process, and save data from XML, HTML, or JSON documents.

Under the Hood

Xidel wraps XQuery, XPath, and JSON into one convenient front end. XQuery, a W3C Recommendation since 2007, lets you query XML or HTML files as if they were database servers, process the extracted data as desired, and save data to other files. As shown in the XQuery tutorial [2], XQuery-capable software can complete requests like finding all the CDs in an online catalog that cost less than $10, sorted by release date.

Xidel also fully supports the other W3C Recommendations, XPath [3] and the data-interchange language JavaScript Object Notation (JSON) [4]. XPath defines both a syntax for identifying all the elements of an XML document and a library of standard functions that make it easy to navigate through such elements and extract them. JSON data structures represent any kind of data as objects made of unordered sets of name/value pairs (I'll show some examples of this later on in this article).

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Download Article PDF now with Express Checkout

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subscriptions

Digital Subscriptions

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Substantial Update to IPFire Now Available

The lastest version of IPFire features a fundamental change to how the system handles DNS.
Gnome Working on Test Center App to Make Testing Easier

Gnome , Linux

It's now possible to test experimental features on the Gnome desktop without worrying that you'll break things.
New Vulnerability Discovered in Linux Kernel

Artificial Inte... , Kernel , vulnerability

Hiding out for nearly 15 years, the Ghostlock vulnerability allows a standard logged-in user to gain root privileges.
New Linux Flaw Lets Attackers Escape VMs

RHEL , Security , vulnerability

A 16-year-old vulnerability allows an attacker to escape a virtual machine, gain access to the host, and execute malicious code.
Hannah Montana Linux Is Back!

DEBIAN , Kubuntu , Plasma

Developer Noah Cagle decided the world needed the once obscure but beloved Linux distribution and gave it a decidedly pink refresh.
System76 Refreshes the Lemur Laptop

Hardware , laptop

If you're looking for a laptop with tons of power and battery, look no further than the latest iteration of the System76 Lemur Pro.
More than 43 Million Lines of Code in Linux Kernel 7.2

Kernel , Linux

Using the cloc utility, Michael Larabel of Phoronix discovered that Linux kernel 7.2 has over 43 million lines of code.
Kubuntu Focus Goes Ultra

Hardware , Kubuntu , laptop

The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.
Linux Gamers May Soon See Less Mouse Lag in KDE Plasma

Games , KDE , Plasma

Gamers using KDE’s Plasma desktop have been suffering from a slight input delay in mouse movement that could lead to getting fragged.
Three Lines of Code Improve Linux Storage Performance

Kernel , Performance , Storage

A developer changed three lines of code, giving Linux storage performance a 5% bump.

An XML, HTML, and JSON data extraction tool

Easy Extraction

Under the Hood

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Substantial Update to IPFire Now Available

Gnome Working on Test Center App to Make Testing Easier

New Vulnerability Discovered in Linux Kernel

New Linux Flaw Lets Attackers Escape VMs

Hannah Montana Linux Is Back!

System76 Refreshes the Lemur Laptop

More than 43 Million Lines of Code in Linux Kernel 7.2

Kubuntu Focus Goes Ultra

Linux Gamers May Soon See Less Mouse Lag in KDE Plasma

Three Lines of Code Improve Linux Storage Performance

An XML, HTML, and JSON data extraction tool

Easy Extraction

Under the Hood

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters