Sensu — A powerful and scalable monitoring solution
Client Installation
The client systems only communicate with RabbitMQ, which makes it very easy to install the Sensu client on the computers you want to monitor. We use the same Sensu package and the same config.json
as on the server but only enable the sensu-client
service.
When the client first starts, it automatically registers with the server. From this moment on, the Sensu server expects at least regular signs of life in the form of keepalive messages. In the default configuration, Sensu raises an alarm if a client has not phoned home within the past three minutes.
Like any monitoring system, Sensu performs checks to verify the status of certain system components. Unlike Nagios, Sensu does not support host-based checks. Checks are always performed by a Sensu client. The client fields the check output and then dumps it on the central message bus for processing by the server. You can develop Sensu checks in any programming language that can output text to stdout.
If you are starting out with Sensu, you will probably want to begin with status checks that reflect the current state of the system. Sensu distinguishes among the following:
- Passive checks requested by the Sensu server
- Active checks that the Sensu client performs without a request
- External events, which separate applications transmit to the Sensu client
Sensu expects the results of status checks in Nagios format. Sensu's support for Nagios format makes it extremely simple for users who are familiar with Nagios to start writing their own checks. Also, Nagios support means a huge number of ready-to-use checks are available from the outset. In fact, I was able to relieve my overtaxed Nagios system by passing many critical checks to Sensu, thus finally enjoying up-to-date monitoring results once again; #monitoringlove flooded the team.
The Sensu server initiates most of the checks. Sensu always addresses the prompt for a check to a group of subscribers. Listing 2 shows a simple configuration that connects the client to the all
and test
groups. Thanks to this publish/subscribe process, a single request to the server is all it takes to perform a routine task on a massive scale, such as querying the free disk space on several hundred clients.
Listing 2
/etc/sensu/conf.d/client.Json
01 ```Json 02 { 03 "client": { 04 "name": "<client1.example.com>", 05 "address": "10.0.10.1", 06 "subscriptions": [ "all", "test" ], 07 "disk_warn": "10%", 08 "disk_crit": "5%" 09 } 10 } 11 ```
Listing 3 shows the configuration for a typical check. Each client that has subscribed to receive all
group messages will, when prompted by the server, perform the check defined in command
(at 60-second intervals) and return its output to the server via RabbitMQ.
Listing 3
Disk Check
01 ~~~Json 02 { 03 "checks": { 04 "disk_free": { 05 "type": "status", 06 "subscribers": [ "all" ], 07 "handlers": [ "default" ], 08 "command": "/usr/lib/nagios/plugins/check_disk -w \ :::disk_warn::: -c :::disk_crit::: \ -A -x /dev/shm -X nfs -i /boot", 09 "interval": 60 10 } 11 } 12 } 13 ~~~
The sample check works with variables that can use specific values for the respective client. The name with three colons on the left and right serves as a placeholder for a variable. The Sensu client takes its local value from the client.conf
file.
In addition to interval
, Sensu also supports other options for managing checks. For example, you might want to configure the system to send a notice to the server only after several failed checks (occurrences) in a row. Sensu also has a feature for handling rapid state changes (flapping).
The standalone check is used if the client actively needs to initiate a check (i.e., independent of the server). Listing 4 shows an example of a locally controlled MySQL check that the client executes every 30 seconds. Active checks are simpler than passive checks because they do not require configuration and management on the server. A JSON file created manually on the client is all it takes to enable an active check.
Listing 4
Active Check
01 ```Json 02 { 03 "checks": { 04 "mysql_server": { 05 "standalone": true, 06 "interval": 30, 07 "handlers": [ 08 "default" 09 ], 10 "command": "/usr/lib/nagios/plugins/check_mysql \ -u 'monitoring' -p 'db1ch3ck'" 11 } 12 } 13 } 14 ```
Active checks are useful for monitoring short-lived servers that do not justify the initial centralized configuration overhead. You can use the management tool that checks configurations to set up active checks (see the "Cooking with Chef" box). Active checks are also useful if you need them to run at specific times. The publish/subscribe process used with passive checks cannot guarantee a specific time.
Cooking with Chef
The Sensu cookbook for setting up active checks defines a simple Chef resource named sensu_check
. Listing 5 contains a recipe fragment that sets up the check through Chef.
Listing 5
Chief Resource for Active Checks
01 ~~~ruby 02 sensu_check 'mysql_server' do 03 command "/usr/lib/nagios/plugins/check_mysql " + \ "-u 'monitoring' " + \ "-p '#{node['mysql']['server_mon_password']}'" 04 handlers ['default'] 05 standalone true 06 interval 30 07 end 08 ~~~
You do not need to develop special checks if you want Sensu both to process status information from the system and to monitor events for an external application. Sensu can transmit its data to the local Sensu client directly via port 3030. Listing 6 shows how easy it is with an sample shell script. The use of the Sensu shell helper [5] has stood the test in practice because Sensu expects external events in JSON format, which can be difficult to create with shell commands. Besides status information, the Sensu client can also collect run-time metrics. Listing 7 shows the definition of a check that runs a Ruby script to increase the system load. As with status checks, the run-time metrics' output format is kept deliberately simple. As you can see from Listing 8, Sensu expects one measuring point per line, consisting of a hierarchical metric ID, the measured value, and a time stamp.
Listing 6
Transferring External Events
01 ~~~bash 02 echo '{ "name": "my_check", "output": "{ ... }", \ "status": 0 }' > /dev/tcp/localhost/3030 03 ~~~
Listing 7
Check for Run-Time Metrics
01 ~~~Json 02 { 03 "checks": { 04 "load_metrics": { 05 "type": "metric", 06 "command": "load-metrics.rb", 07 "subscribers": [ 08 "production" 09 ], 10 "interval": 10 11 } 12 } 13 } 14 ~~~
Listing 8
Metric Check
01 ~~~ 02 $ ruby load-metrics.rb 03 srv3.local.load_avg.one 0.89 1365270842 04 srv3.local.load_avg.five 1.01 1365270842 05 srv3.local.load_avg.fifteen 1.06 1365270842 06 $ echo $? 07 0 08 ~~~
The event handlers on the server evaluate the event once the Sensu client has run the check and returned the results on the message bus. As soon as a new event arrives on the bus, Sensu passes it on (as usual in JSON format) to the relevant event handler.
Sensu distinguishes the following types of event handlers:
- Pipe: A system command executes this type of routine and passes the event data to it via stdin.
- TCP, UDP: Two types of write event data in a TCP or UDP socket.
- Transport: This type internally publishes event data on a transport channel in Sensu, typically RabbitMQ.
- Group: An event handler group sends the event data to a group of event handlers. Adding a single event handler to a group thereby effectively defines an alias name.
Sensu can associate a wide range of actions with an event. Possible actions include:
- Notification via email or text message
- Messages on chat channels
- Alerting via pager duty
- Forwarding of run-time metrics to Graphite
- Generating log entries for evaluation in Logstash
Listing 9 shows how easy it is to process a monitoring event in an event handler. This simple Ruby script is stored in /etc/sensu/handlers/file.rb
and receives events in JSON format, which it writes to files that are formatted to be readable by humans. The new event handler is configured in /etc/sensu/conf.d/handlers/default.json
as a Pipe plugin (Listing 10). It might be easy to build your own event handler, but you can save yourself the trouble in most cases. The Sensu community has collected an extensive repository of ready-to-use plugins on GitHub [6]. The repository contains more than 600 checks, event handlers, and other Sensu extensions.
Listing 9
Event Handler
01 ~~~ruby 02 #!/usr/bin/env ruby 03 04 require 'rubygems' 05 require 'Json' 06 07 # Read event data 08 event = Json.parse(STDIN.read, :symbolize_names => true) 09 # Write the event data to a file 10 file_name = "/tmp/sensu_#{event[:client][:name]}_" + \ "#{event[:check][:name]}" 11 File.open(file_name, 'w') do |file| 12 file.write(Json.pretty_generate(event)) 13 end 14 ~~~
Listing 10
Integrating the Event Handler
01 ~~~Json 02 { 03 "handlers": { 04 "file": { 05 "type": "pipe", 06 "command": "/etc/sensu/handlers/file.rb" 07 } 08 } 09 } 10 ~~~
Automatic Remedies
Would it not be cool if your monitoring system could fix errors as well as detect and report them? Writing an event handler that initiates appropriate measures is not too difficult. However, because the event handler runs on the Sensu server and the error occurs on a client, you need a mechanism to bridge this gap.
At freistil IT, we experimented with the remote execution tool Serf for freistilbox.com. However, smart Sensu users realized that it was not necessary to use two different applications that both ultimately use their own messaging systems to transport actions and events. This realization led to the Sensu Remediator plugin.
Using this plugin, I could assign the check with a three-stage repair strategy. A suitable command was executed on the client at each stage; the plugin also smartly "misappropriated" the Sensu checks. In the example (Listing 11), the plugin first triggers a reload when entering a WARNING status. If the status remains unchanged, the plugin will try a restart instead. The system will respond by rebooting if a CRITICAL status occurs.
Listing 11
Self-Healing Infrastructure
01 ```Json 02 { 03 "checks": { 04 "check_foo": { 05 "command": "check-procs.rb ...", 06 "interval": 30, 07 "subscribers": ["application_server"], 08 "handlers": ["debug", "slack", "remediator"], 09 "remediation": { 10 "light_remediation": { 11 "occurrences": [1, 2], 12 "severities": [1] 13 }, 14 "medium_remediation": { 15 "occurrences": ["3-5"], 16 "severities": [1] 17 }, 18 "heavy_remediation": { 19 "occurrences": ["1+"], 20 "severities": [2] 21 } 22 } 23 }, 24 "light_remediation": { 25 "command": "service foo reload", 26 "subscribers": [], 27 "handlers": ["debug"], 28 "publish": false 29 }, 30 "medium_remediation": { 31 "command": "service foo restart", 32 "subscribers": [], 33 "handlers": ["debug", "slack"], 34 "publish": false 35 }, 36 "heavy_remediation": { 37 "command": "sudo reboot", 38 "subscribers": [], 39 "handlers": ["debug", "slack"], 40 "publish": false 41 } 42 } 43 } 44 ```
The three repair "checks" are deliberately defined without subscribers; the plugin always prompts the affected client to run it. For this approach to work, this client must have a subscription using its own host name (Listing 12).
Listing 12
Self-Subscription
01 ```Json 02 { 03 "client": { 04 "name":"client1.example.com", 05 "address":"10.0.10.1", 06 "subscriptions":[ 07 "all", 08 "test", 09 "client1.example.com" 10 ] 11 } 12 } 13 ```
Sensu is very unobtrusive in day-to-day operations – at least as long as no errors occur. As a sys admin, you hardly have direct interaction with Sensu, especially if you use external services such as Pager Duty for alerting. If you do have a need to interact with the monitoring system, doing so via the web dashboard is simple and efficient. You can acknowledge alerts or even shut them off for a while using the silence
function.
Anyone who prefers to use Sensu without a mouse should try sensu-cli
[7]. This command-line application can acknowledge alerts:
sensu-cli resolve server3 apache_http
or temporarily stop:
sensu-cli silence server3 reason "Shut up already" - expire 3600
Because a new Sensu client registers with the server, this registration must be deleted if the client no longer exists:
sensu-cli client delete server3
This step avoids unnecessary alerts and is easy to do.
ChatOps
Many companies, especially if their employees are geographically dispersed, use chat for team communication. The chat system becomes the central source of information if you enrich team messages with system messages. In this way, everyone finds out about new Git commits or changes in the wiki without delay, and team members can exchange information on the spot. Sensu comes with event handlers for several common chat systems (IRC, Slack, Campfire, etc.).
The quantum leap from the central source of information to ChatOps is achieved by implementing a back channel in the form of a chatbot. This bot is tasked with receiving instructions from the chat and interacting with various OPS services.
GitHub's Hubot [8] is the classic chatbot; on freistilbox, the team had great fun with Lita [9]. Besides simply acknowledging an alert with a simple pagerbot ack 1234 or quickly taking over standby duties for a colleague with pagerbot put me on firstlevel for 1 hour, members were also able to communicate these actions instantly to the rest of the team.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.