Custom solutions for system monitoring and control

Just Right

Article from Issue 182/2016
Author(s):

Off-the-rack monitoring tools often offer too many functions or fail to offer precisely what you need, but shell scripts let you create individual monitoring routines.

Trust is good, but keeping the thumb screws on is better: This is the principle by which IT services and functions are monitored. Although you can find many tools to accomplish this job, tailor-made monitoring doesn't actually need these giants. Simple shell scripts will take you where you need to go just as well.

Whether you need to monitor and control a web server, database system, network connections, users, fans, or computer temperatures, simple shell routines are typically reliable and fast. Once created, scripts can be modified for different distributions and scenarios.

Monitoring needs to be considered carefully, however: In the case of monitoring a web server, it is not just a question of checking that the service is running – the question lacks precision. Is the hardware running? To determine this, all you need is a simple ping. A positive response, however, by no means signifies that the web server daemon is working. To discover this, you need to query the process status locally on the server; that is,

ps -C <service>

or possibly

service <service> status

However, you still don't know whether users can retrieve data from the web server. You would need to test this regularly in a browser, preferably in an automated process using a command-line tool and ideally from somewhere outside of your own protected network infrastructure. Otherwise, you risk being lulled into a false sense of security – for example, even when a router no longer works.

Sensors

When you are monitoring program execution, the task is to check the exit codes that terminal applications and commands typically output after terminating – gracefully or not. A value of 0 typically signals a successful program run, whereas other codes indicate more or less serious errors. Table 1 provides a brief selection of popular tools for system monitoring.

Table 1

Test Tools

Test Objective

Tool

Accessibility of websites

httping (1)

Database shell client for PostgreSQL-RDBMS

psql

Accessibility of computers

ping

Name resolution

host

Logged on users

users

Service status (SysVinit)

/etc/init.d/<Service> status

Service status (Systemd)

systemctl status <Service>

Disk space

df

Temperature

sensors

Fan activity

sensors (2)

Port access

netread (3)

Packages: (1) httping, (2) lm-sensors, (3) netrw

As the example of monitoring a web server shows, monitoring involves a little overhead in some cases (Figure 1). In this case, monitoring would ideally not be operated in-house but from outside of your own IT infrastructure so that failures would not also take down the monitoring system. In this way, you can cover almost all failure cases: web lockouts, overloaded attacks, general network overload, and even cases of physical network disconnection – think backhoes.

Figure 1: Schematic sequence of web server functional monitoring.

In response, you could (automatically) fire up a redundant system at some other location or with a different Internet connection. Listing 1 shows an approach that also clears up other questions as an initial response to delimiting an error (DNS problem, network connection, and more). This script can be extended easily if needed, but watch out for pitfalls caused by some Internet providers when you attempt to access an unreachable Internet site. In some cases, you will be shown a helpful navigation aid and will not want to evaluate the HTTP status there.

Listing 1

Web Server Monitoring

01 #! /bin/sh
02 HOST=www.example.com
03 IP=93.184.216.34
04
05 while true; do
06
07   # Access website, output to variable
08   B=$(httping -G -g $HOST -c 1 -s -m)
09   # Store exit code in variable
10   A=$?
11
12   # Break down httping output
13   C=$(echo $B  | cut -d \  -f1)
14   D=$(echo $B  | cut -d \  -f2)
15
16   # Output variables
17   echo "Exit-Code: $A"
18   echo "STATUS: $C"
19
20   # Check name resolution
21   if [ "$C" = "-1" ]; then
22     host $HOST
23     # Store exit code ...
24     NA=$?
25     # ... and evaluate
26     if [ $NA = 0 ]; then
27       echo "Name resolution ok"
28     else
29       echo "Name resolution error"
30       # Availability via IP address?
31       ping -c 1 -q $IP
32       # Store exit code ...
33       E=$?
34       # ... and evaluate
35       if [ $E -eq 0 ]; then
36         echo "Computer accessible on network"
37       else
38         echo "Computer not accessible on network"
39       fi
40     fi
41   fi
42
43   # Note, if page can be retrieved
44   if [ $D -ne 200 ]; then
45     echo "Page error $D"
46   fi
47
48   sleep 15
49
50 done

The httping command executed on the script (typically from the httping package) calls the stated website and displays additional information, such as latency (see the box "Pinging Web Servers"). This means you can quite easily monitor a web server in terms of functionality. The system monitoring script shown here provides the sensor system for monitoring; the response side is typically outsourced into a second script.

Pinging Web Servers

The httping program checks access to a web server; it can optionally also determine the response behavior, assuming the connection is not routed via a proxy server or does not transfer the complete page content using the -G option, which would falsify response times. The basic call uses the syntax

httping -g <URL>

and you can use the -p <port> option to stipulate a port other than the typical port 80.

If so desired, httping will generate helpful information on top of the exit codes (0 = functioning, 127 = error), including the response time, which assumes a value of -1 for an error. By passing in a variable, you can trigger alarms or responses based on these results. For a better understanding of the function, launch the small sample script from Listing 2. Listing 3 shows the matching output.

The first call targets a working website. Httping shows the response time and the HTTP status code 200. If you point httping at a working domain, but a non-existent website, the test tool will output the classical 404 error with a response time of -1. If the domain doesn't exist, then the Internet provider in this example redirects the script to its own navigation aid with an integrated search function; therefore, httping does not report Resolving exshample.com failed but outputs 302 – the status code for redirection.

Listing 2

Website Test Script

01 #! /bin/sh
02 echo "This website works:"
03 httping -g http://www.example.com -c 1 -s -m
04 echo "-------------------------------------------------"
05 echo "Domain exists, but invalid page:"
06 httping -g http://example.com/page-not-there.html -c 1 -s -m
07 echo "-------------------------------------------------"
08 echo "Domain does not exist, redirected by provider:"
09 httping -g http://exshample.com -c 1 -s -m

Listing 3

Script Output

$ ./listing2.sh
This website works:
206,761122 200
-------------------------------------------------
Domain exists, but invalid page:
-1 404
-------------------------------------------------
Domain does not exist, redirected by provider:
-1 302

Monitoring Databases

Databases are another important building block in any IT infrastructure, and it is obviously important to monitor them. The possibilities include MySQL, MariaDB, or PostgreSQL databases: I focus on PostgreSQL in this example. To monitor the service, you need to create a separate user account and a database with the table for this account. In doing so, the shell scripts can automate the query process. In this example, the database is named watchmen, and it contains the guards table with a number column and a single record (Figure 2).

Figure 2: A simple database to check the functionality of a PostgreSQL server.

The psql shell client uses classical exit codes: 0 for okay and 1 for failed. The shell script in Listing 4 then decides whether the data is simply inaccessible for some reason or whether the service is not working at all. Figure 3 shows the procedure. For test purposes, I deleted the data on one occasion and stopped the service on another. Assuming that the script is running on the same computer as the relational database management system, you can also perform other actions.

Listing 4

Checking PostgreSQL Server

01 #! /bin/sh
02 while true; do
03   # Write date and time to variable
04   TIME=$(date +%d.%m.%Y:%H:%M:%S)
05   # Database query to extract the exit code
06   M=$(psql -q -d watchmen -c "select * from guards;")
07   # Store and evaluate exit code
08   A=$?
09   if [ $A -eq 0 ]; then
10     echo "$TIME Database working"
11   elif [ $A -eq 1 ]; then
12     echo "$TIME Data not found"
13   elif [ $A -eq 2 ]; then
14     echo "$TIME Database inactive
15   fi
16   sleep 60
17 done
Figure 3: Checking the functionality of a PostgreSQL server.

Monitoring Services

Many services simply work away in the background, and you are unable to talk to them directly through a web or database server; thus, it is impossible to check the availability of this kind of service with a simple query. In these cases, you need to rely on the service fulfilling its task if its process is active.

Status queries can be made based on the examples in Table 2. The simplest tool for this task is the ps command. The -C <process> option lets you restrict the search for the process in question to the stipulated name (Listing 5).

Table 2

Service Check

Method

Call

Exit Code

Process status, stating the service

ps -C <process>

0 = exists; 1 = does not exist

Init script with option status

service <Start script> status

None; outputs individual messages instead

Query with systemctl

systemctl status <service>

0 = active, 3 = deactivated

Communication with the service

See examples of web and database servers

Listing 5

Status Query for NTP Daemon

$ ps -C ntpd
  PID TTY          TIME CMD
 1054 ?        00:00:01 ntpd
$ echo $?
0
# Stop service, with system in this case
$ sudo systemctl stop ntpd.service
$ ps -C ntpd
  PID TTY          TIME CMD
$ echo $?
1

The exit codes returned by ps can be processed easily in scripts further downstream. Alternatively, you can pick up the output from the init scripts called by the service command (Listing 6) or, for distributions with systemd, by the systemctl command (Listing 7).

Listing 6

Service (SysVinit)

$ service ntp status
 * NTP server is running
$ sudo service ntp stop
 * Stopping NTP server ntpd             [ OK ]
$ service status
 * NTP server is not running

Listing 7

Systemctl (systemd)

$ systemctl status ntp
|- ntp.service - LSB: Start NTP daemon
   Loaded: loaded (/etc/init.d/ntp)
   Active: active (running) since Mon 2015-10-26 19:22:03 CET; 43s ago
[...]
$ echo $?
0
$ sudo systemctl stop ntp
$ systemctl status ntp
|- ntp.service - LSB: Start NTP daemon
   Loaded: loaded (/etc/init.d/ntp)
   Active: inactive (dead) since Mon 2015-10-26 19:23:04 CET; 4s ago
[...]
$ echo $?
3

Whereas the legacy SysVinit forces you to evaluate the output from the init scripts that you call, which involves considerable overhead, systemd returns more useful exit codes. Depending on the task, requirements, and the system, you can use one of the three methods introduced here to achieve your objectives when monitoring a service.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Charly's Column

    HTTPing may be the perfect tool to check on the health of your web server.

  • systemd Tips

    Sure, you've heard about systemd, which is rapidly replacing the old System V init system as the go-to service management daemon for the Linux world. But what can you do with systemd really? We'll show you some tricks for improving security, managing processes, and analyzing boot times with systemd.

  • Professor Knopper's Lab – Removing systemd

    The systemd service manager has been widely adopted by many Linux distros, so why would you want to remove it? The professor reveals why and how.

  • Monitoring with incron

    The incron utility provides an easy way to initiate commands and scripts triggered by filesystem events.

  • Nagios Traffic Light

    A clever combination of Nagios and a doityourself traffic lights lets you know how your network is feeling.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia