Filtering log messages with Splunk

Search with Remote Control

The post() method in line 57 sends a search command to the Splunk server. In contrast to the web GUI, searches submitted via the API need to start with the search command. Besides the NOT eventtype=chatter filter that I already described, it defines the restriction earliest=-24h; that is, it only asks for events in the past 24 hours. As defined by the output_mode parameter, I want Splunk to return the results in JSON format. If you prefer to avoid lengthy emails, you will also want to restrict the number of hits to 50 using limit=50.

The from_json() function from the JSON CPAN module then converts the results to Perl data structures one line at a time in line 82. Three fields are crucial for the mail to be sent: the time stamp of the log entry with the _time key, the logfile in source, and the original log line in _raw.

The Net::SMTP module from CPAN sends the mail with the results of the search to the target defined in $to_email. The SMTP server $smtp_server was set previously in line 23.

For Dinosaurs and Hipsters

Complex HTML messages annoy old codgers like me that use text-based email readers such as Pine. Conversely, a plain text email is too old school for young, dynamic Outlook and Thunderbird mouse pushers. To mediate between the two worlds, Listing 1 formats the tabular results of the query in ASCII using the CPAN Text::ASCIITable module. To prevent the timestamp column from becoming too long, and to wrap it instead, line 76 limits its width to a maximum of 10 characters. The same thing applies to the column with the log entry, which wraps to a line length of 34, keeping messages readable, even on mobile phones.

Some modern mail readers prefer HTML, and to satisfy them, line 96 calls the CPAN Email::MIME module. It wraps the existing ASCII text in inline HTML, surrounded by simple pre tags. Thus, the results are acceptable both in Alpine (Figure 7) and in Gmail (Figure 8).

Figure 7: Text-based formatting for a venerable mailer like Alpine.
Figure 8: HTML formatting for modern mail readers; Gmail shown here.

The script can be easily extended to include tests that compare the values found with previously set limits and then only send messages when the limit is exceeded. This can happen once per day as a summary or at five-minute intervals for rapid-alert email messages.

Some open source alternatives to Splunk, such as Logstash [6] [7] and Graylog2 [8], are also available, but so far they do not come close to Splunk in terms of ease of use and scalability.

Infos

  1. Splunk: http://www.splunk.com
  2. Hadoop: http://hadoop.apache.org
  3. "Giant Data: MapReduce and Hadoop" by Thomas Hornung, Martin Przyjaciel-Zablocki, and Alexander Schätzle: http://www.admin-magazine.com/HPC/Articles/MapReduce-and-Hadoop
  4. Listings for this article: ftp://ftp.linux-magazin.de/pub/listings/magazine/155
  5. "Splunk: Intro REST API tutorial": http://dev.splunk.com/view/SP-CAAADQT
  6. Logstash: http://logstash.net
  7. "Centralized Log Archiving with Logstash" by Martin Loschwitz, Linux Magazine, June 2013, p. 60: http://www.linux-magazine.com/Issues/2013/151/Logstash/(language)/eng-US
  8. Graylog2: http://graylog2.org

The Author

Mike Schilli works as a software engineer with Yahoo! in Sunnyvale, California. He can be contacted at mailto:mschilli@perlmeister.com. Mike's homepage can be found at http://perlmeister.com.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Splunk Announces SDKs for Java and Python

    New SDKs aim to integrate Splunk with big data applications.

  • Tech Tools
    • NVidia gaming device
    • Qt 5.0 Released
    • Oracle NoSQL DB 2.0
    • SuperSpeed USB 3.0
  • Nagios Workshop

    Nagios monitors your network and provides early warning for problems with hosts and services.

  • adtool

    The simple but useful adtool lets you manage an Active Directory domain from the Linux command line.

  • Logstash

    When something goes wrong on a system, the logfile is the first place to look for troubleshooting clues. Logstash, a log server with built-in analysis tools, consolidates logs from many servers and even makes the data searchable.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News