Using Bash one-liners for stats

Add Statistics to a Plot

Gnuplot offers the ability to include stats on a plot using

gnuplot> plot 'filename' using column options, stat1 options, stat2 options, stat3 option ...

Figure 6 show how to put stats on a jitter plot. Here, multiple gnuplot statements were passed by the execute (-e) option. A semicolon (;) is used to separate each statement in the text string.

Figure 6: Show stats on a jitter plot.

To get the required presentation, jitter plots may need some tweaking of xrange, overlap, pt (point type 7=circle), and ps (point size). The jitter plot in Figure 6 uses a setting of (0):1, which configures the first column of data to be positioned around   on the X-axis.

In Figure 6, stats were given a prefix name of JULY. The mean (JULY_mean), the first quartile (JULY_lo_quartile), and the third quartile (JULY_up_quartile) were added to the plot with a line width (lw) of 3.

Linear Regression

While linear regression in languages like R or Python is straightforward, it can be a challenge in SQL. Gnuplot stats will do a linear regression calculation when two columns are defined. If there is only a single column of data, a linear regression can be done against a row index by referencing to row   (Listing 4).

Listing 4

Linear Regression with gnuplot

gnuplot># Do a linear regression based on a row index
gnuplot>stats 'july.csv' using 0:1
...
  Linear Model:       y = 0.001489 x + 22.74
  Slope:              0.001489 +- 0.0002746
  Intercept:          22.74 +- 0.2062
  Correlation:        r = 0.1488

Similar to what was done in the earlier jitter plot, linear regression stats can be included on a line chart. Figure 7 shows a combination of raw data, a curve fit, and the Y-mean. A gnuplot linear function of

y =m*x +b
Figure 7: Create a curve fit in gnuplot.

is created that uses the stat variable JULY_slope for m and JULY_intercept for b. The Y-average (JULY_mean_y) and the f(x) function are added to the line plot with line widths (lw) of 5. Using this one rather long statement, I was able to see that there was a slight increase in July temperatures over the past 15 years.

Summary

AWK can be used to extract and format data similar to an SQL SELECT query statement. The filtered results from AWK can be piped to different command-line tools such as sort, sta, or gnuplot to do statistical calculations.

Gnuplot is an extremely powerful tool for data analysis and visualization, making it ideal for small Bash scripts. Typically gnuplot is used with a CSV data file, but it can also be used with SQL query output.

For quick stats and visual checks of CSV data, a AWK/gnuplot solution is a good addition to your data toolkit.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Gnuplot

    Use Gnuplot with command-line utilities.

  • Open Data

    A lot of COVID-19 data is available through online REST APIs. With a little ingenuity and some open source tools, you can extract and analyze the data yourself.

  • Embedding Scripts in Bash

    Solve Bash blind spots by embedding other scripting languages into your Bash scripts to get the features you need. Pete shows you solutions for floating-point math, charting, GUIs, and hardware integration.

  • Tutorials – Shell Math

    While Bash is not the most advanced environment for doing and visualizing math, its power will surprise you. Learn how to calculate and display your results with shell scripts.

  • Stat-like Tools for Admins

    ASCII tools can be life savers when they provide the only access you have to a misbehaving server. However, once you're on the node what do you do? In this article, we look at stat-like tools: vmstat, dstat, and mpstat.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News