Organizing and reusing Bash code

Tutorials – Bash Functions

Article from Issue 224/2019
Author(s):

Learn how to make your Bash code more readable, robust, and reusable by managing the code within your Bash scripts.

A common problem with every scripting or programming language is that the more complex you make your code, the more difficult it is to understand, debug, and extend that code – unless you organize it correctly from the beginning.

In this installment of my shell scripting series, I show how Bash can help you organize your code correctly along with some related best practices. With a focus on how and why to use Bash functions and variable scopes, I'll present a shell script that can load code from other files, plus a few practical examples.

The Basics

Bash functions allow you to wrap up, as a single command, more or less complex chunks of code that perform any kind of task. You can then invoke these functions as many times as you like inside your script. Bash provides two equivalent syntaxes for defining functions. The first syntax consists of using the function keyword, followed by the function name, and finally curly braces that enclose all the function's code:

function geotag_photo
{
 # code of function is here
 # (full code is shown later)
}

The other syntax type only consists of the function name followed by parentheses:

geotag_photo()
{
 # code of function is here
}

Don't let this second syntax fool you. It does not mean that you can pass arguments to a Bash function inside these parentheses. Instead, you must write those parameters, in the correct order, right after the function name whenever you call it:

geotag_photo $NEWNAME $PLACE

Inside a function, the arguments passed to it are referred like those of full shell scripts: The first is $1, the second $2, and so on. You can use a function to perform some always equal, self-contained task (e.g., remove all backup files from your home directory) or to assign a new value to some global variable of your script. You may do that inside the function itself:

function currentmonth()
{
MONTH=`date +%B`
}
currentmonth
echo $MONTH

However, this is not the best (or at least the most readable or reusable) way to do it. It is much better to call the function on the right side of an assignment, in the two formats shown here:

function currentmonth()
{
date +%B
}
MONTH=$(currentmonth)
MONTH2=`currentmonth`

Alternatively, you may pass the name of the global variable to modify to your function:

function currentmonth()
{
local  __globalvar=$1
local  functionresult=`date +%B`
eval $__globalvar="'$functionresult'"
}
currentmonth THISMONTH
echo $THISMONTH

Personally, I find this last format much too verbose. In my experience, it has been the best solution for only one, very specific and rare situation: When I needed to set, with one invocation of the same function, the value of two or more distinct global variables. Your mileage, of course, may vary.

For completeness, if the value a function must "pass" to the code that calls it is numeric, there is one other way to do it. I explained this technique in a previous installment of this series [1]:

current_day_of_the_month () {
day=`date +%e`
return $day
}
today=$?

You can explicitly set the function's exit status, which Bash captures in the special variable $?, to the value you must pass to the calling code.

However you invoke a function, remember to write it in the right place! Bash functions must be defined before they are called in a script! I will show the best way to do this later.

Also, it is important to always give your functions descriptive, but unique names. In Bash, you may give your function the same name as another standard command-line program, for example grep. If you do that in a script, any successive invocation of grep will execute your function instead of the standard grep program. Should you want to also use the grep program in the same script, you should prepend the command keyword to its name as follows:

command grep Marco addressbook.txt

Why go to all this effort? Instead, just call your function mygrep or something similar and avoid all the hassle.

Why Use Functions?

In my experience, there are four reasons to use functions in Bash scripts: reusability, readability, robustness, and efficiency. First, in terms of reusability, functions make it much easier to share and reuse code among many different scripts in a way that minimizes errors.

Next comes readability. As previously discussed, every single "bundle" of code, be it a function or an entire script, should have a descriptive name. In addition, all code longer than 20 or 30 lines should be self-documenting (i.e, include clear descriptions of what it does and why). It turns out that the function-based way to apply this second rule is, in many cases, more effective than comments. For a better understanding, please compare the pseudocode in Listings 1 and 2, which both describe a hypothetical script managing the Linux accounts of an organization's employees.

Listing 1, which includes three distinct bundles of code, explains what each does with comments before each section. Listing 2, instead, packages the same three bundles of code as three named functions. Listings 1 and 2 work in the same way, producing exactly the same results when given the same inputs. When it comes to readability and self-documentation, however, Listing 2 is much more effective just because of functions.

Listing 1

Comment-Based Approach

 # 1: set disk quota for new employee
 (ALL the code to calculate disk quota here)
 # 2: assign user to proper group(s)
 (ALL the code to figure out all the groups to which this user should be added)
 # 3: inform colleagues that a new employee joined the team
  (ALL the code that sends email, updates the company's online directory etc...)
....

Listing 2

Function-Based Approach

set_disk_quota_for $employee_name
assign_to_groups   $employee_name
inform_team_of     $employee_name

The comments in Listing 1 are clear and work like an instruction manual's chapter titles, neatly partitioning the entire script into several sections. In real life, however, comments like these may very well be separated by tens, or even hundreds, of lines. This would make them useless as aids to quickly understand a script's high-level flow. Listing 2's function-based approach, instead, makes the function names themselves work like a table of contents for the entire script: The first thing you see when you open the file is a description of the entire flow in the most compact way possible.

Many (but not all!) functions should also be more or less black boxes, as independent as possible from each other and from the rest of your code. Doing so makes the functions, and by extension the entire script, more robust, which is the third reason to adopt this coding practice.

Furthermore, if it is difficult to make your functions independent, this may be a sign that your high-level flow diagram and algorithms are not designed in the most efficient way, regardless of how you translate them into code!

If your algorithm's building blocks are as isolated as possible from each other and interact only by exchanging arguments and return values, then they will be more robust: While coding errors will not disappear, they will likely be fewer, and above all more confined, making them easier to track. The set command can make this tracking even easier as follows:

set -x
somefunction
set +x

The two set instructions enable (-x) and disable (+x) debugging messages. Adding them before and after a specific call to some function is a great way to check if that piece of code works without distracting from the rest of the script! For other uses of set, see the "Other set Uses" box.

Other set Uses

The set command has several switches to make debugging and performance optimization of your scripts easier, regardless of whether they contain functions or not. Besides the previously mentioned -x and +x, the two switches I find most useful are:

  • set -o errexit, which makes the script exit as soon as a command fails
  • set -o pipefail, which returns the (non-null) exit status of the last command in a pipe that failed

By default, all Bash variables have a global scope: They are visible and modifiable from any part of the script in which they appear. This is exactly what you want to happen with variables, or constant parameters, that are actually needed by all of your code.

The less a function sees of the script outside itself, however, means the less damage it can do. For the same reason, it is much better for any variable that is only necessary inside one function to only exist and be visible inside that function. Luckily, all you have to do to achieve this effect is to explicitly declare a variable as local:

local DAYOFWEEK='Monday'

Thanks to that declaration, if there is another $DAYOFWEEK anywhere else in the script, it will not be affected by what happens to the local version of the variable, and vice-versa.

In general, making variables local avoids accidental modifications by other code in ways that could be very difficult to debug – especially if that other code is added much later, when you may have forgotten what you had written into the function!

More Efficient Code Handling

Functions not only let you execute and reuse big blocks of commands as if they were one single line of code, they also allow you to manage it in the same way. In other words, functions make your code more efficient. I previously showed you an example of this fourth reason to use functions with the set debugging example. The following example shows another efficiency application:

nice -10 geotag_photo
geotag_photo () { function code here } >> geotag.log

The nice program forces whatever command is passed to it (in this case, the geotag_photo function) to run with a lower priority than the one that the operating system otherwise assigns by default. It guarantees that long (but not urgent) tasks will continue to run, but without subtracting too many CPU cycles from the most important tasks.

In the previous example, placing all of the "low-priority" code inside one function is exactly what lets you slow down only that code, and only when you need it. Just prepend that keyword to each invocation of the function that needs "niceness."

The second line in the code snippet is less intuitive, but equally convenient sometimes. By appending redirection to some file to the function's definition, all its output will be automatically appended to that logfile, without the need to rewrite its name every time you call the function.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Batsh

    Batsh kills two birds with one stone: Programs written in this language can be compiled both as Linux Bash scripts and Windows batch files.

  • Bash Tuning

    In the old days, shells were capable of little more than calling external programs and executing basic, internal commands. With all the bells and whistles in the latest versions of Bash, however, you hardly need the support of external tools.

  • Bash scripting

    A few scripting tricks will help you save time by automating common tasks.

  • Bash 4

    Despite the Bourne-again shell's biblical age and high level of maturity, developers continue to work on it. We take a look at the latest Bash release.

  • Quick-and-Dirty Geotagging with a Bash Script
comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News