Custom solutions for system monitoring and control
Just Right

Off-the-rack monitoring tools often offer too many functions or fail to offer precisely what you need, but shell scripts let you create individual monitoring routines.
Trust is good, but keeping the thumb screws on is better: This is the principle by which IT services and functions are monitored. Although you can find many tools to accomplish this job, tailor-made monitoring doesn't actually need these giants. Simple shell scripts will take you where you need to go just as well.
Whether you need to monitor and control a web server, database system, network connections, users, fans, or computer temperatures, simple shell routines are typically reliable and fast. Once created, scripts can be modified for different distributions and scenarios.
Monitoring needs to be considered carefully, however: In the case of monitoring a web server, it is not just a question of checking that the service is running – the question lacks precision. Is the hardware running? To determine this, all you need is a simple ping
. A positive response, however, by no means signifies that the web server daemon is working. To discover this, you need to query the process status locally on the server; that is,
ps -C <service>
or possibly
service <service> status
However, you still don't know whether users can retrieve data from the web server. You would need to test this regularly in a browser, preferably in an automated process using a command-line tool and ideally from somewhere outside of your own protected network infrastructure. Otherwise, you risk being lulled into a false sense of security – for example, even when a router no longer works.
Sensors
When you are monitoring program execution, the task is to check the exit codes that terminal applications and commands typically output after terminating – gracefully or not. A value of
typically signals a successful program run, whereas other codes indicate more or less serious errors. Table 1 provides a brief selection of popular tools for system monitoring.
Table 1
Test Tools
Test Objective | Tool |
---|---|
Accessibility of websites |
|
Database shell client for PostgreSQL-RDBMS |
|
Accessibility of computers |
|
Name resolution |
|
Logged on users |
|
Service status (SysVinit) |
|
Service status (Systemd) |
|
Disk space |
|
Temperature |
|
Fan activity |
|
Port access |
|
Packages: (1) httping, (2) lm-sensors, (3) netrw |
As the example of monitoring a web server shows, monitoring involves a little overhead in some cases (Figure 1). In this case, monitoring would ideally not be operated in-house but from outside of your own IT infrastructure so that failures would not also take down the monitoring system. In this way, you can cover almost all failure cases: web lockouts, overloaded attacks, general network overload, and even cases of physical network disconnection – think backhoes.
In response, you could (automatically) fire up a redundant system at some other location or with a different Internet connection. Listing 1 shows an approach that also clears up other questions as an initial response to delimiting an error (DNS problem, network connection, and more). This script can be extended easily if needed, but watch out for pitfalls caused by some Internet providers when you attempt to access an unreachable Internet site. In some cases, you will be shown a helpful navigation aid and will not want to evaluate the HTTP status there.
Listing 1
Web Server Monitoring
01 #! /bin/sh 02 HOST=www.example.com 03 IP=93.184.216.34 04 05 while true; do 06 07 # Access website, output to variable 08 B=$(httping -G -g $HOST -c 1 -s -m) 09 # Store exit code in variable 10 A=$? 11 12 # Break down httping output 13 C=$(echo $B | cut -d \ -f1) 14 D=$(echo $B | cut -d \ -f2) 15 16 # Output variables 17 echo "Exit-Code: $A" 18 echo "STATUS: $C" 19 20 # Check name resolution 21 if [ "$C" = "-1" ]; then 22 host $HOST 23 # Store exit code ... 24 NA=$? 25 # ... and evaluate 26 if [ $NA = 0 ]; then 27 echo "Name resolution ok" 28 else 29 echo "Name resolution error" 30 # Availability via IP address? 31 ping -c 1 -q $IP 32 # Store exit code ... 33 E=$? 34 # ... and evaluate 35 if [ $E -eq 0 ]; then 36 echo "Computer accessible on network" 37 else 38 echo "Computer not accessible on network" 39 fi 40 fi 41 fi 42 43 # Note, if page can be retrieved 44 if [ $D -ne 200 ]; then 45 echo "Page error $D" 46 fi 47 48 sleep 15 49 50 done
The httping
command executed on the script (typically from the httping package) calls the stated website and displays additional information, such as latency (see the box "Pinging Web Servers"). This means you can quite easily monitor a web server in terms of functionality. The system monitoring script shown here provides the sensor system for monitoring; the response side is typically outsourced into a second script.
Pinging Web Servers
The httping program checks access to a web server; it can optionally also determine the response behavior, assuming the connection is not routed via a proxy server or does not transfer the complete page content using the -G
option, which would falsify response times. The basic call uses the syntax
httping -g <URL>
and you can use the -p <port>
option to stipulate a port other than the typical port 80.
If so desired, httping will generate helpful information on top of the exit codes (
= functioning, 127
= error), including the response time, which assumes a value of -1
for an error. By passing in a variable, you can trigger alarms or responses based on these results. For a better understanding of the function, launch the small sample script from Listing 2. Listing 3 shows the matching output.
The first call targets a working website. Httping shows the response time and the HTTP status code 200
. If you point httping at a working domain, but a non-existent website, the test tool will output the classical 404 error with a response time of -1
. If the domain doesn't exist, then the Internet provider in this example redirects the script to its own navigation aid with an integrated search function; therefore, httping does not report Resolving exshample.com failed but outputs 302
– the status code for redirection.
Listing 2
Website Test Script
01 #! /bin/sh 02 echo "This website works:" 03 httping -g http://www.example.com -c 1 -s -m 04 echo "-------------------------------------------------" 05 echo "Domain exists, but invalid page:" 06 httping -g http://example.com/page-not-there.html -c 1 -s -m 07 echo "-------------------------------------------------" 08 echo "Domain does not exist, redirected by provider:" 09 httping -g http://exshample.com -c 1 -s -m
Listing 3
Script Output
$ ./listing2.sh This website works: 206,761122 200 ------------------------------------------------- Domain exists, but invalid page: -1 404 ------------------------------------------------- Domain does not exist, redirected by provider: -1 302
Monitoring Databases
Databases are another important building block in any IT infrastructure, and it is obviously important to monitor them. The possibilities include MySQL, MariaDB, or PostgreSQL databases: I focus on PostgreSQL in this example. To monitor the service, you need to create a separate user account and a database with the table for this account. In doing so, the shell scripts can automate the query process. In this example, the database is named watchmen
, and it contains the guards
table with a number
column and a single record (Figure 2).
The psql
shell client uses classical exit codes:
for okay and 1
for failed. The shell script in Listing 4 then decides whether the data is simply inaccessible for some reason or whether the service is not working at all. Figure 3 shows the procedure. For test purposes, I deleted the data on one occasion and stopped the service on another. Assuming that the script is running on the same computer as the relational database management system, you can also perform other actions.
Listing 4
Checking PostgreSQL Server
01 #! /bin/sh 02 while true; do 03 # Write date and time to variable 04 TIME=$(date +%d.%m.%Y:%H:%M:%S) 05 # Database query to extract the exit code 06 M=$(psql -q -d watchmen -c "select * from guards;") 07 # Store and evaluate exit code 08 A=$? 09 if [ $A -eq 0 ]; then 10 echo "$TIME Database working" 11 elif [ $A -eq 1 ]; then 12 echo "$TIME Data not found" 13 elif [ $A -eq 2 ]; then 14 echo "$TIME Database inactive 15 fi 16 sleep 60 17 done
Monitoring Services
Many services simply work away in the background, and you are unable to talk to them directly through a web or database server; thus, it is impossible to check the availability of this kind of service with a simple query. In these cases, you need to rely on the service fulfilling its task if its process is active.
Status queries can be made based on the examples in Table 2. The simplest tool for this task is the ps
command. The -C <process>
option lets you restrict the search for the process in question to the stipulated name (Listing 5).
Table 2
Service Check
Method | Call | Exit Code |
---|---|---|
Process status, stating the service |
|
|
Init script with option |
|
None; outputs individual messages instead |
Query with systemctl |
|
|
Communication with the service |
– |
See examples of web and database servers |
Listing 5
Status Query for NTP Daemon
$ ps -C ntpd PID TTY TIME CMD 1054 ? 00:00:01 ntpd $ echo $? 0 # Stop service, with system in this case $ sudo systemctl stop ntpd.service $ ps -C ntpd PID TTY TIME CMD $ echo $? 1
The exit codes returned by ps
can be processed easily in scripts further downstream. Alternatively, you can pick up the output from the init scripts called by the service
command (Listing 6) or, for distributions with systemd, by the systemctl
command (Listing 7).
Listing 6
Service (SysVinit)
$ service ntp status * NTP server is running $ sudo service ntp stop * Stopping NTP server ntpd [ OK ] $ service status * NTP server is not running
Listing 7
Systemctl (systemd)
$ systemctl status ntp |- ntp.service - LSB: Start NTP daemon Loaded: loaded (/etc/init.d/ntp) Active: active (running) since Mon 2015-10-26 19:22:03 CET; 43s ago [...] $ echo $? 0 $ sudo systemctl stop ntp $ systemctl status ntp |- ntp.service - LSB: Start NTP daemon Loaded: loaded (/etc/init.d/ntp) Active: inactive (dead) since Mon 2015-10-26 19:23:04 CET; 4s ago [...] $ echo $? 3
Whereas the legacy SysVinit forces you to evaluate the output from the init scripts that you call, which involves considerable overhead, systemd returns more useful exit codes. Depending on the task, requirements, and the system, you can use one of the three methods introduced here to achieve your objectives when monitoring a service.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
The GNU Project Celebrates Its 40th Birthday
September 27 marks the 40th anniversary of the GNU Project, and it was celebrated with a hacker meeting in Biel/Bienne, Switzerland.
-
Linux Kernel Reducing Long-Term Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.
-
Fedora 39 Beta Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.