Prepare calculations and chart results with Bash Math, Shell Style
Database Backup Example
MySQL or MariaDB relational databases are very common inside Linux servers or desktops. Their content can be saved in plain text files, whose content can then be used for any calculation necessary. Of course, the first question that comes to mind is "why not perform those calculations inside the database? Aren't database engines optimized to do just that?"
The answer to this question is yes, relational databases are much faster than Bash, but they can only process their own content! If you need to do anything more than that with your files (from comparing the numbers inside the database with those inside some other file to using the same numbers to decide what to do next), you must do it outside the database. Bash is an ideal environment for such tasks. There are two other advantages of working on plain text backups instead of the actual tables in the live database. One is that you can compare at any moment backups made at different times. Another is that, starting from the same database backup file, you can generate with Bash many partial dumps to combine their content in Bash in ways that would not be possible with any direct database queries. One application of this approach in small business would be finding all customers with unpaid balances over some threshold inside the database, in order to automatically send them reminders by email, each with the corresponding invoice attached. In the same way, a teacher could generate custom exercises for each student using research or financial databases, depending on previous results.
To produce output that is immediately usable for this kind of work, use:
mysqldump -u DBUSER --password=PASSWORD --skip-extended-insert > database-backup.sql
This is the simplest approach, because the --skip-extended-insert
option dumps each database record on a separate line. If the database contained only one table called purchaseorders
, composed of four columns that corresponded to the unique identifier, date, description, and amount of each purchase, the file database-backup.sql
would contain lots of lines with this format:
INSERT INTO `purchaseorders` VALUES (44,'2004-10-20','stationery',1855.0000);
Knowing that the database has the above format, you may save in a separate file all the records that are about stationery purchased between 2004 and 2005 with this command:
egrep "'2004\-|2005\-" stationery-expenses-2004-2005.sql
Then extract the amounts only, as follows:
cut '-d,' -f2,4 stationery-expenses-2004-2005.sql >expense-amounts.txt
Here, I have used the cut
command, setting the column delimiter to the comma ('-d,'
) and selecting only the second and fourth column (-f2,4
) to save only dates and amounts inside the file expense-amounts.txt
. Now, for example, if I wanted to calculate the average import of those stationery expenses, I would feed the second column of expense-amount
to the awk
program:
cat expense-amounts.txt | awk '-F,' '{sum+=$2} END { print "Average = ",sum/NR}'
The -F
switch tells awk
to use commas as column separators instead of spaces, which are its default. The commands inside braces mean "read the file one line at a time, adding to the sum
variable the value in the second column; when the file is finished, print the string Average
, followed by the value of sum
divided by the number of records (NR
).
While the above mix of grep
, cut
, and awk
is not the most efficient, I wanted to show you all three tools working together to give you a good idea of their potential for massive number crunching in shell scripts. Combining these programs, you can extract, reformat, and preprocess, with very little effort on your part, lots of numeric data of any nature. The awk
trick above works the same way, for example, when you want to calculate the average size of all the files in the current directory, including all its subfolders. Try it for yourself:
find . -type f -exec ls -l {} \; | awk '{sum+=$5} END { print "Average = ",sum/NR}'
Charts
While I am not sure that a picture is always worth a thousand words, a chart is surely worth much more than a huge, unreadable table full of numbers. The tool I recommend to handle such tasks from shell scripts today is the same one I used in the 90s: the venerable Gnuplot [14]. The graphs it creates are not the fanciest, but Gnuplot is very well documented and available on almost any operating system, including Linux distributions optimized for tiny servers. Besides, using Gnuplot is really simple (see Figure 1):
- 1 Save the numbers to plot in a plain text data file.
- 2 Save the plotting commands for Gnuplot in an instruction file.
- 3 Tell Gnuplot to execute the instruction file on the data file.
-
Creating the data file on the fly consists of running awk
commands similar to the one used in the database example or applying the techniques I presented in this series' previous installments. Concerning plotting commands, unless you have really uncommon needs, 90 percent of the time a quick online search will return commands ready to copy and paste into your script with little or no changes. The result will be spartan, but it will deliver clear charts like Figure 2 (to see how I generated this, see [15]) that your script can then embed in web pages, send by email, or use in any other way desired.
The Big Picture
If all you needed to know about "math" in Bash were its arithmetic operators, a tiny cheatsheet would suffice. Since this is not the case, this installment is different from the previous ones: This time, presenting one, relatively complex script that solves one specific real-world problem would have been too reductive. I wanted to introduce, instead, with the minimum amount of working code, approaches and tools suitable for any kind of number-related processing and show how they fit together (see Figure 3).
There are two take-away lessons from this installment. First, it is easy to insert single formulas in Bash scripts, using bc
if necessary. Second, very often a Bash script is the most efficient way to run or prepare calculations on large quantities of numbers of any nature, from any source (including the Internet). The real power of Gnuplot, awk
, bc
, and so on is in how you combine them in the same shell script. Give it a try!
Infos
- "Tutorials – Custom Shell Scripts" by Marco Fioretti, Linux Magazine, issue 219, February 2019, pp. 84-88
- "Tutorials – Complex Containers" by Marco Fioretti, Linux Magazine, issue 220, March 2019, pp. 84-89
- "Tutorials – Shell Flow Control" by Marco Fioretti, Linux Magazine, issue 221, April 2019, pp. 86-91
- "Tutorials – Shell Test Conditions and Exit Codes" by Marco Fioretti, Linux Magazine, issue 222, May 2019, pp. 84-88
- Genius: http://www.jirka.org/genius.html
- Graph: http://www.padowan.dk
- bc man page: https://linux.die.net/man/1/bc
- Handling time differences: https://stackoverflow.com/questions/4946785/how-to-find-the-difference-in-days-between-two-dates
- "Bash Scripting – Arithmetic, Logical, Relational and Bitwise Operators": http://www.yourownlinux.com/2016/12/bash-arithmetic-logical-relational-bitwise-operators.html
- "Geographical Distance Between Long and Lat in Bash": http://www.unix.com/shell-programming-and-scripting/250356-geographical-distance-between-long-lat-bash.html
- "Calculate Distance and Azimuth": http://www.unix.com/shell-programming-and-scripting/129989-calculate-distance-azimuth.html
- "Work the Shell – Calculating the Distance between Two Latitude/Longitude Points" by Dave Taylor, Linux Journal, December 1, 2009: http://www.linuxjournal.com/magazine/work-shell-calculating-distance-between-two-latitudelongitude-points
- R: http://www.r-project.org/
- Gnuplot: http://www.gnuplot.info
- "How to Create Stacked Area Graphs with Gnuplot": http://freesoftware.zona-m.net/how-to-create-stacked-area-graphs-with-gnuplot/
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.