Custom solutions for system monitoring and control
Just Right
Off-the-rack monitoring tools often offer too many functions or fail to offer precisely what you need, but shell scripts let you create individual monitoring routines.
Trust is good, but keeping the thumb screws on is better: This is the principle by which IT services and functions are monitored. Although you can find many tools to accomplish this job, tailor-made monitoring doesn't actually need these giants. Simple shell scripts will take you where you need to go just as well.
Whether you need to monitor and control a web server, database system, network connections, users, fans, or computer temperatures, simple shell routines are typically reliable and fast. Once created, scripts can be modified for different distributions and scenarios.
Monitoring needs to be considered carefully, however: In the case of monitoring a web server, it is not just a question of checking that the service is running – the question lacks precision. Is the hardware running? To determine this, all you need is a simple ping
. A positive response, however, by no means signifies that the web server daemon is working. To discover this, you need to query the process status locally on the server; that is,
ps -C <service>
or possibly
service <service> status
However, you still don't know whether users can retrieve data from the web server. You would need to test this regularly in a browser, preferably in an automated process using a command-line tool and ideally from somewhere outside of your own protected network infrastructure. Otherwise, you risk being lulled into a false sense of security – for example, even when a router no longer works.
Sensors
When you are monitoring program execution, the task is to check the exit codes that terminal applications and commands typically output after terminating – gracefully or not. A value of
typically signals a successful program run, whereas other codes indicate more or less serious errors. Table 1 provides a brief selection of popular tools for system monitoring.
Table 1
Test Tools
Test Objective | Tool |
---|---|
Accessibility of websites |
|
Database shell client for PostgreSQL-RDBMS |
|
Accessibility of computers |
|
Name resolution |
|
Logged on users |
|
Service status (SysVinit) |
|
Service status (Systemd) |
|
Disk space |
|
Temperature |
|
Fan activity |
|
Port access |
|
Packages: (1) httping, (2) lm-sensors, (3) netrw |
As the example of monitoring a web server shows, monitoring involves a little overhead in some cases (Figure 1). In this case, monitoring would ideally not be operated in-house but from outside of your own IT infrastructure so that failures would not also take down the monitoring system. In this way, you can cover almost all failure cases: web lockouts, overloaded attacks, general network overload, and even cases of physical network disconnection – think backhoes.
In response, you could (automatically) fire up a redundant system at some other location or with a different Internet connection. Listing 1 shows an approach that also clears up other questions as an initial response to delimiting an error (DNS problem, network connection, and more). This script can be extended easily if needed, but watch out for pitfalls caused by some Internet providers when you attempt to access an unreachable Internet site. In some cases, you will be shown a helpful navigation aid and will not want to evaluate the HTTP status there.
Listing 1
Web Server Monitoring
01 #! /bin/sh 02 HOST=www.example.com 03 IP=93.184.216.34 04 05 while true; do 06 07 # Access website, output to variable 08 B=$(httping -G -g $HOST -c 1 -s -m) 09 # Store exit code in variable 10 A=$? 11 12 # Break down httping output 13 C=$(echo $B | cut -d \ -f1) 14 D=$(echo $B | cut -d \ -f2) 15 16 # Output variables 17 echo "Exit-Code: $A" 18 echo "STATUS: $C" 19 20 # Check name resolution 21 if [ "$C" = "-1" ]; then 22 host $HOST 23 # Store exit code ... 24 NA=$? 25 # ... and evaluate 26 if [ $NA = 0 ]; then 27 echo "Name resolution ok" 28 else 29 echo "Name resolution error" 30 # Availability via IP address? 31 ping -c 1 -q $IP 32 # Store exit code ... 33 E=$? 34 # ... and evaluate 35 if [ $E -eq 0 ]; then 36 echo "Computer accessible on network" 37 else 38 echo "Computer not accessible on network" 39 fi 40 fi 41 fi 42 43 # Note, if page can be retrieved 44 if [ $D -ne 200 ]; then 45 echo "Page error $D" 46 fi 47 48 sleep 15 49 50 done
The httping
command executed on the script (typically from the httping package) calls the stated website and displays additional information, such as latency (see the box "Pinging Web Servers"). This means you can quite easily monitor a web server in terms of functionality. The system monitoring script shown here provides the sensor system for monitoring; the response side is typically outsourced into a second script.
Pinging Web Servers
The httping program checks access to a web server; it can optionally also determine the response behavior, assuming the connection is not routed via a proxy server or does not transfer the complete page content using the -G
option, which would falsify response times. The basic call uses the syntax
httping -g <URL>
and you can use the -p <port>
option to stipulate a port other than the typical port 80.
If so desired, httping will generate helpful information on top of the exit codes (
= functioning, 127
= error), including the response time, which assumes a value of -1
for an error. By passing in a variable, you can trigger alarms or responses based on these results. For a better understanding of the function, launch the small sample script from Listing 2. Listing 3 shows the matching output.
The first call targets a working website. Httping shows the response time and the HTTP status code 200
. If you point httping at a working domain, but a non-existent website, the test tool will output the classical 404 error with a response time of -1
. If the domain doesn't exist, then the Internet provider in this example redirects the script to its own navigation aid with an integrated search function; therefore, httping does not report Resolving exshample.com failed but outputs 302
– the status code for redirection.
Listing 2
Website Test Script
01 #! /bin/sh 02 echo "This website works:" 03 httping -g http://www.example.com -c 1 -s -m 04 echo "-------------------------------------------------" 05 echo "Domain exists, but invalid page:" 06 httping -g http://example.com/page-not-there.html -c 1 -s -m 07 echo "-------------------------------------------------" 08 echo "Domain does not exist, redirected by provider:" 09 httping -g http://exshample.com -c 1 -s -m
Listing 3
Script Output
$ ./listing2.sh This website works: 206,761122 200 ------------------------------------------------- Domain exists, but invalid page: -1 404 ------------------------------------------------- Domain does not exist, redirected by provider: -1 302
Monitoring Databases
Databases are another important building block in any IT infrastructure, and it is obviously important to monitor them. The possibilities include MySQL, MariaDB, or PostgreSQL databases: I focus on PostgreSQL in this example. To monitor the service, you need to create a separate user account and a database with the table for this account. In doing so, the shell scripts can automate the query process. In this example, the database is named watchmen
, and it contains the guards
table with a number
column and a single record (Figure 2).
The psql
shell client uses classical exit codes:
for okay and 1
for failed. The shell script in Listing 4 then decides whether the data is simply inaccessible for some reason or whether the service is not working at all. Figure 3 shows the procedure. For test purposes, I deleted the data on one occasion and stopped the service on another. Assuming that the script is running on the same computer as the relational database management system, you can also perform other actions.
Listing 4
Checking PostgreSQL Server
01 #! /bin/sh 02 while true; do 03 # Write date and time to variable 04 TIME=$(date +%d.%m.%Y:%H:%M:%S) 05 # Database query to extract the exit code 06 M=$(psql -q -d watchmen -c "select * from guards;") 07 # Store and evaluate exit code 08 A=$? 09 if [ $A -eq 0 ]; then 10 echo "$TIME Database working" 11 elif [ $A -eq 1 ]; then 12 echo "$TIME Data not found" 13 elif [ $A -eq 2 ]; then 14 echo "$TIME Database inactive 15 fi 16 sleep 60 17 done
Monitoring Services
Many services simply work away in the background, and you are unable to talk to them directly through a web or database server; thus, it is impossible to check the availability of this kind of service with a simple query. In these cases, you need to rely on the service fulfilling its task if its process is active.
Status queries can be made based on the examples in Table 2. The simplest tool for this task is the ps
command. The -C <process>
option lets you restrict the search for the process in question to the stipulated name (Listing 5).
Table 2
Service Check
Method | Call | Exit Code |
---|---|---|
Process status, stating the service |
|
|
Init script with option |
|
None; outputs individual messages instead |
Query with systemctl |
|
|
Communication with the service |
– |
See examples of web and database servers |
Listing 5
Status Query for NTP Daemon
$ ps -C ntpd PID TTY TIME CMD 1054 ? 00:00:01 ntpd $ echo $? 0 # Stop service, with system in this case $ sudo systemctl stop ntpd.service $ ps -C ntpd PID TTY TIME CMD $ echo $? 1
The exit codes returned by ps
can be processed easily in scripts further downstream. Alternatively, you can pick up the output from the init scripts called by the service
command (Listing 6) or, for distributions with systemd, by the systemctl
command (Listing 7).
Listing 6
Service (SysVinit)
$ service ntp status * NTP server is running $ sudo service ntp stop * Stopping NTP server ntpd [ OK ] $ service status * NTP server is not running
Listing 7
Systemctl (systemd)
$ systemctl status ntp |- ntp.service - LSB: Start NTP daemon Loaded: loaded (/etc/init.d/ntp) Active: active (running) since Mon 2015-10-26 19:22:03 CET; 43s ago [...] $ echo $? 0 $ sudo systemctl stop ntp $ systemctl status ntp |- ntp.service - LSB: Start NTP daemon Loaded: loaded (/etc/init.d/ntp) Active: inactive (dead) since Mon 2015-10-26 19:23:04 CET; 4s ago [...] $ echo $? 3
Whereas the legacy SysVinit forces you to evaluate the output from the init scripts that you call, which involves considerable overhead, systemd returns more useful exit codes. Depending on the task, requirements, and the system, you can use one of the three methods introduced here to achieve your objectives when monitoring a service.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.