Working with the JSON data format
Data Dog

© Lead Image © Fabian Schmidt, Fotolia.com
JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.
Our world of web applications and fast, interactive mobile devices calls for the free exchange of data in easily accessible forms. Standard formats promote interoperability and minimize development time. Open formats also make it easy to import data into other applications. Over the years, several popular alternatives have emerged. CSV, XML, and YAML are well known and easy to adapt to different applications (see the box entitled "Comparing Formats" and Listings 1-3 for examples). One format that is used extensively for web applications, mobile applications, and even some conventional desktop tools is JavaScript Object Notation (JSON) [1].
Comparing Formats
Using comma-separated values (CSV) ensures the data remains manageable for the most part, but the format is neither standardized nor particularly flexible. In a file like the one in Listing 1, neither the character set nor the separator between columns is fixed. In practical applications, spaces, tabs, hyphens, or semicolons are sometimes used instead of commas. Corresponding key data either has to be agreed upon or inferred from the file itself. Furthermore, the format does not allow nested records, arrays, or binary data.
The Extensible Markup Language (XML) is far more structured and flexible than CSV. A notable feature of XML is the data field enclosed in the field name with the form <fieldname>value</fieldname>
(Listing 2). In practice, it makes sense to choose field names that let you infer the contents. The order of the fields is usually variable in a layer, and fields can be missing. One disadvantage of XML is that an XML file is significantly larger due to the need to continually repeat the field labels for each entry.
YAML is a recursive acronym for YAML Ain't Markup Language. The YAML specification describes a very compact way to serialize data. Hyphens and indentations using spaces serve as the basis for denoting fields. YAML borrows from XML, as well as from the format in which three programming languages (Python, Perl, and C) describe their data structures. Listing 3 shows the book inventory data as a YAML structure.
JSON [1] is based on JavaScript. The format is also very compact and flexible. In contrast to YAML, JSON explicitly identifies objects and their attributes, whereas in YAML, the assignment is derived from the context of the indentation depth.
Listing 1
CSV File
Stephen Fry; The Hippopotamus; 1994 Ian Rankin; Set In Darkness; 2009 Ken Follett; The Pillars of the Earth; 1989
Listing 2
XML File
<inventory> <book> <author>Stephen Fry</author> <title>The Hippopotamus</title> <publication>1994</publication> </book> <book> <author>Ian Rankin</author> <title>Set In Darkness</title> <publication>2009</publication> </book> <book> <author>Ken Follett</author> <title>The Pillars of the Earth</title> <publication>1989</publication> </book> </inventory>
Listing 3
YAML File
--- book: - author: Stephen Fry title: The Hippopotamus publication: '1994' - author: Ian Rankin title: Set In Darkness publication: '2009' - author: Ken Follett title: The Pillars of the Earth publication: '1989'
JSON is wildly popular as a tool for passing information between web apps – for instance, it is currently the de facto standard for REST services – yet for many users, the details of JSON format are shrouded in mystery. This article takes a close look at JSON and some of the tools available for reading, manipulating, and importing JSON data.
Understanding JSON
The notation of JSON is analogous to objects, records, or dictionaries – depending on what that structure is currently called in your favorite programming language. Even though JSON format is based on JavaScript, parsers exist in almost all programming languages. In addition to Awk and C/C++, you can integrate JSON with Fortran, Go, Lisp, Lua, Python, and Visual Basic.
In everyday life, you will find the format in the data-sharing Jupyter Notebook app [2], in geographical specifications like GeoJSON [3] (Listing 4), and even in databases like MongoDB.
Listing 4
GeoJSON File
{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } }
Taking a closer look at the JSON data structure, you will see that it is in an easy-to-read, text-based format. Parentheses, colons, and commas separate the individual elements; the data can be nested as desired. This means, for example, that you can map lists, arrays, or objects. Table 1 summarizes the elementary data types that JSON supports.
Table 1
JSON Data Types
Data Type | Description |
---|---|
Strings |
All Unicode characters except ", \, and control characters |
Numbers |
Numeric values including hexadecimal and exponential values, for example 0x42 and .2e-3 |
Boolean values |
Logic values true and false |
Arrays |
Comma-separated, unordered lists of properties, although objects without properties are also allowed |
Objects with properties |
Notation as key-value pairs |
Null values |
null, NIL, or () |
Mark the individual structure levels with brackets and indentation for better readability (Listing 5). Pairs of curly braces ({
and }
) each form a unit. Square brackets ([
and ]
) are used to indicate fields (also known as arrays). Individual field elements follow the form of an enumeration and are separated by commas. Each field element consists of a key-value pair separated by a colon (:)
.
Listing 5
JSON File
{"book": [ { "author": "Stephen Fry", "title": "The Hippopotamus", "publication": "1994" }, { "author": "Ian Rankin", "title": "Set In Darkness", "publication": "2009" }, { "author": "Ken Follett", "title": "The Pillars of the Earth", "publication": "1989" } ]}
JSON was originally created in the early 2000s to exchange data between web applications. JSON worked quite well in the web context (even though you cannot always parse it unambiguously). To structure data, JSON falls back on conventions familiar to anyone who has programmed in a C-based language (C, C++, C#, Java, JavaScript, Perl, Python, and others).
JSON is specified according to RFC 8259 [4] and ECMA-404; common extensions are JSONP (JSON with padding), JSONPP (JSON with padding and parameters), and JSONML, which combines XML and JSON together. The character set for all JSON formats is Unicode (UTF-8), which eliminates the character-set guessing game that you will be familiar with from CSV.
You can use JSON to exchange smaller volumes of data between applications in an agile way. However, if the transferred data volume increases (e.g., if you have millions of measurements from a sensor), JavaScript-based Python libraries like Ipywidgets, Bokeh, and Plotly often fail. In the face of large data volumes, binary transport mechanisms are a better option for handling the load.
Tool Overview
Several command-line tools are available for parsing, processing, and outputting JSON data. Table 2 summarizes some of the available tools. All of these tools are available as packages for Debian GNU/Linux, Ubuntu, Devuan, Linux Mint, and macOS.
Table 2
JSON Tools
Tool | Language | Application |
---|---|---|
aeson-pretty [5] |
Haskell |
Output JSON in a readable way |
jc [6] |
Python |
Convert output to JSON |
jid [7] |
Go |
Interactively filter JSON |
jo [8] |
C |
JSON output in the shell |
jq [9] |
C |
Output and filter JSON in a readable way |
Jshon [10] |
C |
Read and generate JSON |
JSONLint [11] |
PHP |
Validate JSON data |
Not all of the tools in Table 2 are intuitive, and some of them only develop their full impact in a specific context. You will find more information on these tools in the various documents and cheat sheets available online [12].
Easily Readable JSON Output
When it comes to pretty printing, aeson-pretty, jc, jo, jq, and Jshon all have something to say. Some of the tools have a command-line parameter for printing, for example, -p
for jo.
In Listing 6, cat
and aeson-pretty
work together for readable output via a pipe. Jq delivers the same results with the next call, but the output is in color (Figure 1):
$ jq . book_inventory.json
Listing 6
Printing with aeson-pretty
$ cat book-inventory.json | aeson-pretty { "book": [ { "publication": "1994", "author": "Stephen Fry", "title": "The Hippopotamus" }, { "publication": "2009", "author": "Ian Rankin", "title": "Set In Darkness" }, { "publication": "1989", "author": "Ken Follett", "title": "The Pillars of the Earth" } ] } $
The dot in the call to jq is not immediately understandable. It stands for the expression to be processed; in this case, it denotes all objects specified as parameters in the JSON file. You can define colorizing of the output using two options, -C
(--colour-output
) and -M
(--monochrome-output
).
Some users prefer compact output with as few (space) characters as possible. In Listing 7, see aeson-pretty with the -c
(short for --compact
) option. This option reduces the number of characters in the output by 45 percent, from 428 to 236 bytes. Compared to Listing 5 and Listing 6, the results still convey the same information, but with only half the amount of data.
Listing 7
Compact Output with aeson-pretty
$ cat book-inventory.json | aeson-pretty -c {"book":[{"publication":"1994","author":"Stephen Fry","title":"The Hippopotamus"}, {"publication":"2009","author":"Ian Rankin","title":"Set In Darkness"}, {"publication":"1989","author":"Ken Follett","title":"The Pillars of the Earth"}]}
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.
-
openSUSE Leap 15.5 Beta Now Available
The final version of the Leap 15 series of openSUSE is available for beta testing and offers only new software versions.
-
Linux Kernel 6.2 Released with New Hardware Support
Find out what's new in the most recent release from Linus Torvalds and the Linux kernel team.