Delving the depths of Linux with sysdig

Big Dig

© Lead Image © Stanislav Komogorov, 123RF.com

© Lead Image © Stanislav Komogorov, 123RF.com

Article from Issue 168/2014
Author(s):

Many Linux diagnostic tools require knowledge of a special syntax, which complicates handling and confuses the output. Sysdig groups several important tools into a single interface.

On a modern Linux system, numerous processes often run simultaneously. Several applications might be running at once, and each application opens files, writes data, reads data, closes files, and so on. All this activity stresses the CPU, which can lead to bottlenecks that can slow down the entire system.

System administrators use tools such as top, ps, vmstat, strace, and lsof to find and fix these bottlenecks. The output of the tools often serves as input for other tools, which often leads to complex and confusing situations.

Sysdig [1] cleans up some of that confusion. The sysdig developers grouped the commands they used most frequently and equipped the tool with a programmable interface. Sysdig understands a large number of options that control specific properties. (You can try the sysdig --help command for a list of options.)

Sysdig's line-by-line output consists of several parts, or fields. The first two fields, evt.num and evt.time clearly identify the described event with a number and the date on which the software registered it. Additionally, evt.cpu describes the involved CPU for systems with multiple CPUs.

The proc.name field stands for the process, thread.tid for the thread. The software uses evt.dir to tell the user how the event works: < stands for incoming data and > for outgoing. The evt.type field classifies the results themselves as, for example, read or open. Last, but not least, the evt.args field summarizes the event arguments.

Installation

You can install the current 1.82 version of the program from the repository under Arch Linux. For other distributions, the manufacturer offers a somewhat unorthodox method for setting up the software: You use a script downloaded by curl that you run in Bash:

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

The script automatically detects the operating system and sets up the appropriate version. It currently supports Debian from version 6.0, Ubuntu 10.04 onward, CentOS from v6, RHEL from v6, Fedora from v13, and Linux Mint from v9. If it does not work with older versions of Ubuntu, install Sysdig with the following commands:

curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public | sudo apt-key add -
curl -s -o /etc/apt/sources.list.d/draios.list http://download.draios.com/stable/deb/draios.list
sudo apt-get update

The string in the first line downloads the repository's public key and installs it; line 2 registers the repository in a file in /etc/apt/sources.list.d/. After the obligatory update (line 3), the packages are ready for selection in Synaptic. You then need to generate and install a special kernel module. The kernel headers can be installed with the following command:

apt-get -y install

For more details on installation, refer to the online documentation [2]. Matching versions for Windows and Mac OS X also are available to download.

Hands-On

If you start sysdig as root without options, you will instantly see output in the form shown in Listing 1. To exit this mode, type Ctrl+C.

Listing 1

sysdig Output

# sysdig
<...>
3 11:03:20.522466433 2 <NA> (0) > switch next=158(systemd-journal)
<...>
232 11:03:20.524714772 1 plugin-containe (5081) < futex res=-110(ETIMEDOUT)
<...>
286 11:03:20.533770002 0 firefox (4901) > mmap
<...>
338 11:03:20.536870106 0 firefox (4901) > poll fds=5:e1 4:u1 8:p3 10:u3 11:u1 23:p1 25:u1 timeout=0
<...>
387 11:03:20.537960783 2 Timer (5750) < futex res=-110(ETIMEDOUT)
<...>
2249 11:03:20.548869168 2 java (29547) < futex res=-110(ETIMEDOUT)
<...>
2266 11:03:20.551182910 0 emacs (16938) > writev fd=4(<u>) size=112
<...>
197129 11:03:21.241713273 3 Xorg (1757) < read res=72 data=.?.S. <...>

The first recorded event is labeled number 3 and comes from the systemd-journal process It follows one of many processes signified as plugin-containe (the output truncates the "r"), the embedded Flash player in Firefox, and then Firefox itself, which generates an array of events. Of interest, among other things, is that Firefox uses a different CPU than the container. In the last row of the listing (with the data= string), you can see a number of dots. Sysdig writes them to represent non-printable characters in the output. If necessary, you can change this behavior using options like -A, which tells the program to output only ASCII characters.

Because sysdig can register almost 20,000 events a second, it is evident that meaningful use of the software requires a powerful filter to restrict output to the desired events. You append the details for filtering to the command as options. Listing 2 shows how to reduce the output to the read event.

Listing 2

Filtering Events

# sysdig evt.type=read
152839 13:02:03.673561027 3 pulseaudio (4360) < read res=2 data=WW
152840 13:02:03.673561173 2 threaded-ml (3223) > read fd=23(<p>pipe:[1593199]) size=10

Chisels

The system does not directly reveal very many complex details (e.g., processes with the most inputs or outputs), so you will need to determine this information by aggregating data and using statistical methods. This, and much more, is done through what are known as chisels, which are 2KB Lua scripts that sysdig activates via the -c <chisel name> option.

This is how the example in Listing 3 analyzes the slowest system calls. To begin, sysdig collects data and doesn't stop until you stop the program, when the collected data and its output are analyzed. Again, Flash player stands out: Besides the Java program, it consumes the most resources and slows down the system the most.

Listing 3

Finding Bottlenecks

# sysdig -c bottlenecks
89898) 0.000000000 plugin-containe (5080) > futex addr=7FAB265B1DA4 op=137
170611) 1.000095651 plugin-containe (5080) < futex res=-110(ETIMEDOUT)
170597) 1.000069882 plugin-containe (16961) < futex res=-110(ETIMEDOUT)
89454) 0.000000000 java (29540) > futex addr=7F400C252F54 op=393
135393) 1.000024874 java (29720) < futex res=-110(ETIMEDOUT)

Many chisels require additional arguments – perhaps a monitored IP address or a port – specified directly after the chisel, say

sysdig -c spy_ip <IP address>

During the installation, sysdig copies the chisels into the /usr/share/sysdig/chisels/ directory. Thanks to the relatively simple structure, they are suitable as templates for your own development.The sysdig-cl selection shows existing chisels organized in six categories, allowing you to:

  • examine CPU and network workload,
  • determine throughput,
  • analyze performance of the entire system,
  • make security checks, and
  • drive error analysis.

Most categories include several different variants of chisels, which allow special predictions.

Filters

As mentioned, sysdig lets you restrict output to the most relevant information. To do this, specify the events relevant for you at the command line:

evt.type=open

Out of the many events, Table 1 summarizes a few of the most common. Typing

sysdig -l

shows all supported events.

Table 1

Key Fields

Name

Function

fd.num

Number of file descriptors

fd.type

Type of file descriptors

fd.name

Path or connection (for sockets)

fd.directory

Directory

fd.filename

File name without path

proc.pid

PID of the producing process

proc.exe

Name and path of the producing process

proc.cmdline

Command line of the producing process

thread.tid

Thread ID of the producing thread

thread.totexectime

Total CPU time of the active threads

evt.num

Number of events

evt.time

Event timestamp

evt.rawtime

Event timestamp (absolute, in nanoseconds)

evt.type

Type of event

evt.args

All the arguments in one string

evt.arg[]

Array with arguments

evt.buffer

Binary buffers

evt.res

Return value of the event

evt.is_io_<name>

Various I/O events

user.uid

User ID to which the generating process belongs

user.name

Username to which the generating process belongs

user.homedir

Associated user's home directory

user.shell

Shell of the producing process

evt.latency<value>

Various latency values

If you combine multiple fields with the logical expression and, they act as variables for the outputs, or you can limit them using contains <pattern>. In this case, the specified pattern must occur in the data of the field for sysdig to output them. The focus is often only a certain range of values, especially with numeric data. You can limit these more accurately, if necessary (see Table 2).

Table 2

Operators

Operator

Meaning

=

Equal to

!=

Not equal to

<threshold

Less than the specified threshold

<§§I>><§§I>threshold

More than the specified threshold

<=threshold

Same or less than the specified threshold

>=threshold

More or less than the specified threshold

Another possibility is grouping the operators using parentheses, and negating them using not. There is also a logical OR; however, this presupposes that you double quote the expressions in the shell. An example from the documentation shows this:

sysdig "not (fd.name contains /proc or fd.name contains /dev)"

It is not always necessary or sensible to analyze the data collected by sysdig directly. Sometimes, it makes sense to cache the data obtained initially and analyze it subsequently in different ways.

You can save the unfiltered output in a file using the -w <file> option. Developers recommend the .scap suffix for these files. Selecting sysdig -r <file> reads all collected data; you can append the desired filter options to the command line.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • FOSSPicks

    Graham Morrison looks at VCV Rack, Audible Instruments, TripleA, Neofetch 3.3.0, TripleA, Eolie 0.9, and more!

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Charly's Column: ifdata

    Script-friendly ifdata from the Moreutils package delivers absolutely precise network interface status information, with no need to extract individual values.

  • acpid Hot Keys

    A little research from the command line and a short script bends your keyboard to your will.

  • Linux-Controlled Model Train

    Controlling a miniature train empire with concurrent Linux processes.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia