Using Bash one-liners for stats
Bash Out Some Stats

© Photo by Carlos Muza on Unsplash
With just one line of Bash you can use tools like AWK and gnuplot to quickly analyze and plot your data.
Typically when I'm looking to do some data analysis, I'll import the data files into Pandas DataFrames or an SQL database. During a recent project, I was happily surprised to learn that I could do a lot of basic statistics with only one line of Bash code.
For simple applications, Bash tools such as sort
and bc
(the arbitrary precision calculator) can be used to find maximums, minimums, averages, and sums from arrays or columns of data (Listing 1).
Listing 1
Basic Stats Using sort and bc
$ # Basic stats using sort and bc $ data=(3 4 18 7 2 19 15) $ $ # Find the Max value in an array $ printf "%s\n" "${data[@]}" | sort -n | tail -n 1 19 $ # Find the Min value in an array $ printf "%s\n" "${data[@]}" | sort -n | head -n 1 2 $ # Sum up an array $ IFS="+" ; bc<<<"${data[*]}" 68 $ # Average from an array, with 2 decimals $ sum=$(IFS="+" ; bc<<<"${data[*]}") $ bc <<<"scale=2; ${sum}/${#data[@]}" 9.71
For CSV data files, a single line of Bash that combines AWK [1] and gnuplot [2] can be used to view statistics or graph a column of data (Figure 1).
In this article, I will cover using AWK to filter and extract data from CSV files and then turn to gnuplot to gather statistics and present charts.
Mimicking SQL SELECT Statements
Both a Linux command-line tool and a programming language, AWK can be used for data extraction and reporting. AWK can work directly on CSV files and output results based on both column and row filtering conditions. The syntax for creating an SQL SELECT
-style statement in AWK is:
awk -F, 'condition {print column_numbers}' filename
Figure 2 shows an example comparing an SQL SELECT
statement with an equivalent AWK statement. The first parameter in the AWK line is -F,
, which sets the column format separator as a comma. In AWK, the conditions (or the WHERE
statement) come first, followed by print
to output the required columns.
Unlike SQL, AWK uses column numbers instead of column names, so $1
for the first column, $2
for the second column, and so on.
While this wouldn't be my first choice, AWK can be used to do stats on a column of data. Listing 2 shows an example of how to get some basic stats on the second ($2
) column of a CSV file, as well as some additional AWK features.
Listing 2
Basic Stats Using AWK
$ # Use AWK to get stats on a CSV file $ cat numbers.csv Monday, 1.1 Tuesday, -3.6 Wednesday, 9.81 Thursday, 6.0 $ # find a min, use a large starting value $ awk -F, -v min=9999 '{if ($2<min) min=$2} END {print min}' numbers.csv -3.6 $ # find a max, use a small starting value $ awk -F, -v max=-9999 '{if ($2>max) max=$2} END {print max}' numbers.csv 9.81 $ # find the sum of row 2 $ awk -F, '{ sum += $2 } END {print sum }' numbers.csv 13.31 $ # find the average of row 2 $ awk -F, '{ sum += $2 } END {print sum/NR }' numbers.csv 3.3275
In the min
and max
calculation in Listing 2, variables are predefined and defaulted with the -v
option. An if
statement can be used to check and set variables on a row-by-row basis. The average calculation uses a two-step pass. The first pass totalizes column $2
into a variable called sum
. An END
statement defines the end of the first step, and then the second step prints the average result. For complex AWK scripts, multiple steps can be defined within BEGIN
and END
blocks.
The beauty of AWK is that it can filter or preprocess the data for other Bash commands. For example, AWK can be used to extract column $2
data from a CSV file, and then the results can be piped to sort
and head
to find the maximum value:
$ awk -F, '{print $2}' numbers.csv | sort -n | tail -n1 9.81
It should be noted that there are several statistical command-line methods available. The sta
[3] tool is an excellent utility for finding basic stats on a column of data. Listing 3 uses AWK to send column $5
data to sta
.
Listing 3
Using AWK with sta
$ # Use AWK with the sta utility $ awk -F, '{print $5}' london_weather.csv | sta N min max sum mean sd sderr 15336 -6.2 37.9 235987 15.3878 6.5555 0.0529
Now that you know how to filter and extract data from a CSV file, the next step is to use gnuplot to do some advanced statistics and charting.
gnuplot
Gnuplot's statistical option can be used as a standalone tool or integrated with Bash commands. To use gnuplot with CSV files, the data separator will need to be set before the stats can be calculated:
$ gnuplot gnuplot> set datafile separator ',' gnuplot> # Get stats on a column 3 in a file gnuplot> stats 'filename.csv' using 3
Gnuplot natively supports data filtering by rows and columns. However, the filtering syntax is not as user friendly or as complete as AWK. To pipe AWK results to gnuplot, you can use
awk -F, 'condition {print column}' filename | gnuplot -e 'stats "<cat" '
The gnuplot -e
option is used to execute a string of statements, and the "<cat"
parameter defines that the input data is piped.
Figure 3 shows a statistical example that compares similar AWK/gnuplot commands and results with an SQL statement. The gnuplot stats
option returns a fairly complete list of calculations. To extract a specific stat value, the output is given a variable name prefix and then the result can be used/printed based on the prefix_stat
. For example, to get the median value of a column, you would use
gnuplot -e 'stats "<cat" name "TEMPS" nooutput; print TEMPS_median'
If two columns are passed to the stats
command, calculations such as slope, intercept, and correlation will be returned.
Visualizing with gnuplot
Like the earlier stats example, one-line statements can be created that pipe AWK output to a gnuplot chart. The syntax for an AWK/gnuplot line chart is
awk -F, 'condition {print column}' csvfile | gnuplot -p -e 'plot "<cat" w l'
The gnuplot persist option, -p
, keeps the plot open after the statement is executed, and w l
stands for a chart with lines.
Figure 4 shows an example of an AWK/gnuplot call that creates a line chart. For comparison, the equivalent SQL SELECT
statement with a DB Browser line plot is also shown.

Gnuplot offers a good variety of chart types. For example, Figure 5 shows a box plot, which can help identify outlier data. In my project, I could see that July had some skewing of high temperature values.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
Two Local Privilege Escalation Flaws Discovered in Linux
Qualys researchers have discovered two local privilege escalation vulnerabilities that allow hackers to gain root privileges on major Linux distributions.
-
New TUXEDO InfinityBook Pro Powered by AMD Ryzen AI 300
The TUXEDO InfinityBook Pro 14 Gen10 offers serious power that is ready for your business, development, or entertainment needs.
-
Danish Ministry of Digital Affairs Transitions to Linux
Another major organization has decided to kick Microsoft Windows and Office to the curb in favor of Linux.
-
Linux Mint 20 Reaches EOL
With Linux Mint 20 at its end of life, the time has arrived to upgrade to Linux Mint 22.
-
TuxCare Announces Support for AlmaLinux 9.2
Thanks to TuxCare, AlmaLinux 9.2 (and soon version 9.6) now enjoys years of ongoing patching and compliance.
-
Go-Based Botnet Attacking IoT Devices
Using an SSH credential brute-force attack, the Go-based PumaBot is exploiting IoT devices everywhere.
-
Plasma 6.5 Promises Better Memory Optimization
With the stable Plasma 6.4 on the horizon, KDE has a few new tricks up its sleeve for Plasma 6.5.
-
KaOS 2025.05 Officially Qt5 Free
If you're a fan of independent Linux distributions, the team behind KaOS is proud to announce the latest iteration that includes kernel 6.14 and KDE's Plasma 6.3.5.
-
Linux Kernel 6.15 Now Available
The latest Linux kernel is now available with several new features/improvements and the usual bug fixes.
-
Microsoft Makes Surprising WSL Announcement
In a move that might surprise some users, Microsoft has made Windows Subsystem for Linux open source.