Custom solutions for system monitoring and control
Just Right

Off-the-rack monitoring tools often offer too many functions or fail to offer precisely what you need, but shell scripts let you create individual monitoring routines.
Trust is good, but keeping the thumb screws on is better: This is the principle by which IT services and functions are monitored. Although you can find many tools to accomplish this job, tailor-made monitoring doesn't actually need these giants. Simple shell scripts will take you where you need to go just as well.
Whether you need to monitor and control a web server, database system, network connections, users, fans, or computer temperatures, simple shell routines are typically reliable and fast. Once created, scripts can be modified for different distributions and scenarios.
Monitoring needs to be considered carefully, however: In the case of monitoring a web server, it is not just a question of checking that the service is running – the question lacks precision. Is the hardware running? To determine this, all you need is a simple ping
. A positive response, however, by no means signifies that the web server daemon is working. To discover this, you need to query the process status locally on the server; that is,
ps -C <service>
or possibly
service <service> status
However, you still don't know whether users can retrieve data from the web server. You would need to test this regularly in a browser, preferably in an automated process using a command-line tool and ideally from somewhere outside of your own protected network infrastructure. Otherwise, you risk being lulled into a false sense of security – for example, even when a router no longer works.
Sensors
When you are monitoring program execution, the task is to check the exit codes that terminal applications and commands typically output after terminating – gracefully or not. A value of 0
typically signals a successful program run, whereas other codes indicate more or less serious errors. Table 1 provides a brief selection of popular tools for system monitoring.
Table 1
Test Tools
Test Objective | Tool |
---|---|
Accessibility of websites |
|
Database shell client for PostgreSQL-RDBMS |
|
Accessibility of computers |
|
Name resolution |
|
Logged on users |
|
Service status (SysVinit) |
|
Service status (Systemd) |
|
Disk space |
|
Temperature |
|
Fan activity |
|
Port access |
|
Packages: (1) httping, (2) lm-sensors, (3) netrw |
As the example of monitoring a web server shows, monitoring involves a little overhead in some cases (Figure 1). In this case, monitoring would ideally not be operated in-house but from outside of your own IT infrastructure so that failures would not also take down the monitoring system. In this way, you can cover almost all failure cases: web lockouts, overloaded attacks, general network overload, and even cases of physical network disconnection – think backhoes.
In response, you could (automatically) fire up a redundant system at some other location or with a different Internet connection. Listing 1 shows an approach that also clears up other questions as an initial response to delimiting an error (DNS problem, network connection, and more). This script can be extended easily if needed, but watch out for pitfalls caused by some Internet providers when you attempt to access an unreachable Internet site. In some cases, you will be shown a helpful navigation aid and will not want to evaluate the HTTP status there.
Listing 1
Web Server Monitoring
The httping
command executed on the script (typically from the httping package) calls the stated website and displays additional information, such as latency (see the box "Pinging Web Servers"). This means you can quite easily monitor a web server in terms of functionality. The system monitoring script shown here provides the sensor system for monitoring; the response side is typically outsourced into a second script.
Pinging Web Servers
The httping program checks access to a web server; it can optionally also determine the response behavior, assuming the connection is not routed via a proxy server or does not transfer the complete page content using the -G
option, which would falsify response times. The basic call uses the syntax
httping -g <URL>
and you can use the -p <port>
option to stipulate a port other than the typical port 80.
If so desired, httping will generate helpful information on top of the exit codes (0
= functioning, 127
= error), including the response time, which assumes a value of -1
for an error. By passing in a variable, you can trigger alarms or responses based on these results. For a better understanding of the function, launch the small sample script from Listing 2. Listing 3 shows the matching output.
The first call targets a working website. Httping shows the response time and the HTTP status code 200
. If you point httping at a working domain, but a non-existent website, the test tool will output the classical 404 error with a response time of -1
. If the domain doesn't exist, then the Internet provider in this example redirects the script to its own navigation aid with an integrated search function; therefore, httping does not report Resolving exshample.com failed but outputs 302
– the status code for redirection.
Listing 2
Website Test Script
Listing 3
Script Output
Monitoring Databases
Databases are another important building block in any IT infrastructure, and it is obviously important to monitor them. The possibilities include MySQL, MariaDB, or PostgreSQL databases: I focus on PostgreSQL in this example. To monitor the service, you need to create a separate user account and a database with the table for this account. In doing so, the shell scripts can automate the query process. In this example, the database is named watchmen
, and it contains the guards
table with a number
column and a single record (Figure 2).
The psql
shell client uses classical exit codes: 0
for okay and 1
for failed. The shell script in Listing 4 then decides whether the data is simply inaccessible for some reason or whether the service is not working at all. Figure 3 shows the procedure. For test purposes, I deleted the data on one occasion and stopped the service on another. Assuming that the script is running on the same computer as the relational database management system, you can also perform other actions.
Listing 4
Checking PostgreSQL Server
Monitoring Services
Many services simply work away in the background, and you are unable to talk to them directly through a web or database server; thus, it is impossible to check the availability of this kind of service with a simple query. In these cases, you need to rely on the service fulfilling its task if its process is active.
Status queries can be made based on the examples in Table 2. The simplest tool for this task is the ps
command. The -C <process>
option lets you restrict the search for the process in question to the stipulated name (Listing 5).
Table 2
Service Check
Method | Call | Exit Code |
---|---|---|
Process status, stating the service |
|
|
Init script with option |
|
None; outputs individual messages instead |
Query with systemctl |
|
|
Communication with the service |
– |
See examples of web and database servers |
Listing 5
Status Query for NTP Daemon
The exit codes returned by ps
can be processed easily in scripts further downstream. Alternatively, you can pick up the output from the init scripts called by the service
command (Listing 6) or, for distributions with systemd, by the systemctl
command (Listing 7).
Listing 6
Service (SysVinit)
Listing 7
Systemctl (systemd)
Whereas the legacy SysVinit forces you to evaluate the output from the init scripts that you call, which involves considerable overhead, systemd returns more useful exit codes. Depending on the task, requirements, and the system, you can use one of the three methods introduced here to achieve your objectives when monitoring a service.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
News
-
KaOS 2022.06 Now Available With KDE Plasma 5.25
The newest iteration of KaOS Linux not only adds the latest KDE Plasma desktop but sets LibreOffice as the default.
-
Manjaro 21.3.0 Is Now Available
Manjaro “Ruah” has been released and includes the latest Calamares installer, GNOME 42, and much more.
-
SpiralLinux is a New Linux Distribution Focused on Simplicity
A new Linux distribution, from the creator of GeckoLinux, is a Debian-based operating system with a focus on simplicity and ease of use.
-
HP Dev One Linux Laptop is Now Available for Pre-Order
The System76/HP collaboration Dev One laptop, geared toward developers, is now available for pre-order.
-
NixOS 22.5 Is Now Available
The latest release of NixOS with a much-improved package manager and a user-friendly graphical installer.
-
System76 Teams up with HP to Create the Dev One Laptop
HP and System76 have come together to develop a new laptop, powered by Pop!_OS and aimed toward developers.
-
Titan Linux is a New KDE Linux Based on Debian Stable
Titan Linux is a new Debian-based Linux distribution that features the KDE Plasma desktop with a focus on usability and performance.
-
Danielle Foré Has an Update for elementary OS 7
Now that Ubuntu 22.04 has been released, the team behind elementary OS is preparing for the upcoming 7.0 release.
-
Linux New Media Launches Open Source JobHub
New job website focuses on connecting technical and non-technical professionals with organizations in open source.
-
Ubuntu Cinnamon 22.04 Now Available
Ubuntu Cinnamon 22.04 has been released with all the additions from upstream as well as other features and improvements.