Graphing the pandemic with open data


© Lead Image © lucadp, 123RF

© Lead Image © lucadp, 123RF

Article from Issue 240/2020

A lot of COVID-19 data is available through online REST APIs. With a little ingenuity and some open source tools, you can extract and analyze the data yourself.

Travel is broadening. You experience different cultures, see issues from different angles, and meet fun and unique people. I love living abroad, but I have to admit that I have less access to the news from home. Unfortunately, news outlets today spend more time on opinion than facts, and sometimes I just want the unvarnished truth. During the pandemic era, I am especially anxious to learn about the challenges faced by my family back home.

The good news is that a lot of open data on COVID-19 is available on the Internet via REST API calls. This data might be too dry for some, but if you want to get your own impressions of the COVID-19 crisis, without the sometimes intrusive "analysis" of newscasters and commentators, this free Internet data is a valuable resource. This article describes how to access and display freely available COVID-19 data using open source tools. And, if you've already had your fill of COVID-19 information, the techniques I'll describe in this article will also help you with other kinds of government and academic data available through REST APIs.


The CovidAPI project [1] provides COVID data based on the well respected Johns Hopkins University dataset [2]. The original Johns Hopkins data is available in CSV form. Argentine software developer Rodrigo Pomba converted the data to JSON time series format. According to the documentation [3], the goal of the project is to make the data "queryable in a manner in which it could be easily consumed to build public dashboards."

The CovidAPI data is organized by country using the list of ISO country codes [4]. Use the curl command in a terminal window to send a URL that will access the data for a specific country and date:


Calling this API, in this case with curl, will return the following JSON object.

  "count": 1,
  "result": {
    "2020-06-15": {
      "confirmed": 2114026,
      "deaths": 116127,
      "recovered": 576334

This command is an easy way to get the daily report for a specific country and date, but if you want to visualize and analyze the data yourself, you might prefer to request the values for all dates. If you leave off the date, you'll get the data for all available dates:


This command returns one giant JSON message containing the records for every day in the dataset. However, I ran into problems parsing out the individual days due to the dashes that were part of the date. As an alternative approach, I chose to write a small Bash script to fetch the count of the day records then iterate through the list of days to retrieve the COVID-19 information for each day (Listing 1). Most of the steps are self-explanatory if you are familiar with Bash scripts, but see the comment lines for additional information.

Listing 1

001 #!/bin/bash
003 get_count()
004 {
007   # get the count of days since the start
008   CNT=`curl$GATHERCOUNTRY 2>/dev/null | jq '.count'`
009   echo $CNT
010 }
012 gather_state()
013 {
015   cnt=$2
017   echo gather state $GATHERSTATE $cnt days
018   DATAFILE=covid19_${GATHERSTATE}.Data
020   # from beginning until yesterday
021   IDX=$cnt
023   # absolute values, followed by daily delta
024   if [ ! -f $DATAFILE ]
025   then
026     echo "date  positive  hospitalized  deaths  " > $DATAFILE
027   fi
029   while [ $IDX -gt 0 ]
030   do
031     #DATE=`date --date="$IDX days ago"  +%Y%m%d`
032     DATE=`date --date="12:00 today -$IDX days"  +%Y%m%d`
033     FILEDATE=`date --date="12:00 today -$IDX days"  +%Y-%m-%d`
035     CMD="curl${GATHERSTATE}/${DATE}.json"
037     grep $FILEDATE $DATAFILE >/dev/null
038     if [ $? -eq 1 ]
039     then
040       SINGLE=`$CMD 2>/dev/null `
041       error=`echo $SINGLE | jq ".error"`
042       if [ $error == "true" ]
043       then
044         # nothing to output
045         # echo oops looks bad $DATE
047         positive=0
048         hospitalized=0
049         deaths=0
050       else
051         positive=`echo $SINGLE | jq ".positive"`
052         deaths=`echo $SINGLE | jq ".death"`
053         hospitalized=`echo $SINGLE | jq ".hospitalizedCurrently"`
055         if [ $positive == "null" ]; then positive=0; fi
056         if [ $deaths == "null" ]; then deaths=0; fi
057         if [ $hospitalized == "null" ]; then hospitalized=0; fi
058         echo $DATE $IDX
059       fi
060       echo "$FILEDATE  $positive  $hospitalized  $deaths  "  >> $DATAFILE
062     #else
063     #   echo not doing $FILEDATE
064     fi
067     IDX=$(($IDX - 1))
068   done
069 }
072 gather_data()
073 {
075   cnt=$2
077   echo gather $GATHERCOUNTRY
078   DATAFILE=covid19_${GATHERCOUNTRY}.data
080   # absolute values, followed by daily delta
081   if [ ! -f $DATAFILE ]
082   then
083     echo initializing
084     echo "date  confirm  deaths  recover  " > $DATAFILE
085   fi
087   # from beginning until yesterday
088   IDX=$cnt
090   deltadeaths=0
091   deltaconfirm=0
092   deltarecover=0
094   while [ $IDX -gt 0 ]
095   do
096     #DATE=`date --date="$IDX days ago"         +%Y-%m-%d`
097     DATE=`date --date="12:00 today -$IDX days"  +%Y-%m-%d`
099     CMD="curl${GATHERCOUNTRY}/${DATE}"
101     grep $DATE $DATAFILE >/dev/null
102     if [ $? -eq 1 ]
103     then
105       #
106       # we only do this if this date hasn't been retrieved
107       #
108       SINGLE=`$CMD 2>/dev/null `
109       ERR=`echo $SINGLE | grep "404 Not Found" | wc -l`
111       #
112       # only if date found
113       #
114       if [ $ERR -eq 0 ]
115       then
116         deaths=`echo $SINGLE   | jq '.' | grep deaths | sed 's/.*: //' | sed 's/,//' `
117         confirm=`echo $SINGLE  | jq '.' | grep confirm | sed 's/.*: //' | sed 's/,//' `
118         recover=`echo $SINGLE  | jq '.' | grep recover | sed 's/.*: //' | sed 's/,//' `
120         echo $DATE $IDX
121         echo "$DATE  $confirm  $deaths  $recover  "  >> $DATAFILE
122       #else
123       #  echo not doing $DATE
124       fi
126     fi
128     IDX=$(($IDX - 1))
129   done
130 }
132 CNT=`get_count USA`
133 echo $CNT days
134 gather_data USA $CNT
136 # just use state 2 letter code (ie. ny for New York)
137 gather_state mn $CNT
138 gather_state ca $CNT
139 gather_state ia $CNT
140 gather_state mo $CNT
141 gather_state mt $CNT
144 CNT=`get_count DEU`
145 gather_data DEU $CNT
147 CNT=`get_count ESP`
148 gather_data ESP $CNT
150 CNT=`get_count GBR`
151 gather_data GBR $CNT
153 gnuplot

One part of the script that might not be obvious is how I calculate the date.

DATE=`date --date="12:00 today -$IDX days" +%Y-%m-%d`

The date command subtracts a given number of days from the current date and formats the output as a YYYY-MM-DD string.

Of course, it would be inefficient to download hundreds of days worth of data each time if I just want yesterday's data. Because of this, the script verifies if the data has been retrieved before making the REST API call to retrieve the data. The first time you run the script, you get all the data, and on subsequent runs, you only get the new data.

Data by State

Retrieving COVID-19 figures for a whole country is useful for comparing one country against another, but it is less than helpful if you want to know what is really happening locally. The Covid Tracking Project [5] provides COVID-19 data by US state. (Similar projects track pandemic data for other countries – consult your local health resources.)

Just like at the national level, it is possible to retrieve all COVID-19 information by US state for a given date with REST API calls. For instance, to obtain data on the state of Minnesota for August 21, 2020:

curl | jq "."

The state data, unlike the national data, contains an amazing number of statistics submitted by the health authorities. The sheer number of values provided can perhaps only truly be appreciated by an epidemiologist or a statistician.

You can see how many people were hospitalized on a given day or the total number of hospitalizations up until that day.(Listing 2) Also included were the incre- mental changes in positive as well as neg- ative tests results. Using this information, I could have graphed how quickly COVID-19 is spreading by graphing positiveCasesViral vs totalTestsViral or by graphing hospitalizedCurrently over time.

Listing 2

Hospitalizations and Deaths by State

  "date": 20200415,
  "state": "MN",
  "positive": 2321,
  "negative": 41245,
  "pending": null,
  "hospitalizedCurrently": 197,
  "hospitalizedCumulative": 445,
  "inIcuCurrently": 93,
  "inIcuCumulative": 175,
  "onVentilatorCurrently": null,
  "onVentilatorCumulative": null,
  "recovered": 853,
  "dataQualityGrade": "A",
  "lastUpdateEt": "4/14/2020 17:00",
  "dateModified": "2020-04-14T17:00:00Z",
  "checkTimeEt": "04/14 13:00",
  "death": 87,
  "hospitalized": 445,
  "dateChecked": "2020-04-14T17:00:00Z",
  "totalTestsViral": 43566,
  "positiveTestsViral": null,
  "negativeTestsViral": null,
  "positiveCasesViral": null,
  "fips": "27",
  "positiveIncrease": 156,
  "negativeIncrease": 1540,
  "total": 43566,
  "totalTestResults": 43566,
  "totalTestResultsIncrease": 1696,
  "posNeg": 43566,
  "deathIncrease": 8,
  "hospitalizedIncrease": 40,
  "hash": "9521e0ce1f2b1ef5aaf1a81bec48961d85170d78",
  "commercialScore": 0,
  "negativeRegularScore": 0,
  "negativeScore": 0,
  "positiveScore": 0,
  "score": 0,
  "grade": ""

I settled on gathering positives, numbers of people hospitalized, and deaths at a state level. I didn't try to verify that all state totals added up at the national level, as I suspect there can be delays in the reporting chain from the local to the national level.

I can imagine that massive effort to come up with a common structure, as well as getting all the participants to gather all of these types of data. Despite all of their efforts, sometimes the data returned contained fields that were blank, had zeros, or simply had the value null.

Comparing Countries

My COVID-19 gathering script will collect the information from four different countries (Great Britain, USA, Spain, and Germany), as well as statistics for a few US states. This data is temporarily stored in a text file but the information that I am gathering essentially looks similar to Table 1.

Table 1

Sample of Downloaded Data


























Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Statistics with gawk

    With very little overhead, you can access statistics on the spread of COVID-19 using gawk scripts and simple shell commands.

  • Gnuplot

    Use Gnuplot with command-line utilities.

  • Tutorials – Shell Math

    While Bash is not the most advanced environment for doing and visualizing math, its power will surprise you. Learn how to calculate and display your results with shell scripts.

  • Analytics with Python and KDD

    The Knowledge Discovery in Data Mining (KDD) method breaks the business of data analytics into easy-to-understand steps. We'll show you how to get started with KDD and Python.

  • Workspace: ExifTool

    Understanding the full power of ExifTool can be daunting. We show how to put it to practical use.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More