Article from Issue 209/2018

In this issue, sys admin columnist and tool veterinarian Charly Kühnast invites Sysdig, the jack-of-all-trades among system diagnostic tools, into his surgery for a quick checkup. The project promises to unite the functionality of lsof, iftop, netstat, tcpdump, and others.

Where an alpha beast claims to replace an entire herd, the bar is naturally fairly high. Of course, the Wireshark authors, who are also the people behind the Sysdig [1] project, are no beginners. The software only performs well if you have root privileges; otherwise, it can't access all the required system areas. If you launch the tool without parameters, a steady stream of system messages scrolls by: It meticulously logs every single syscall. To thin out the thicket, Sysdig uses what it calls chisels. You can find out which chisels exist with the sysdig -cl command.

The chisels are sorted into categories (Net, IO, application, logs, and so on). For example, the Performance category has a chisel named netlower. I decided to pass in a time value of 10 milliseconds as a parameter:

sysdig -c netlower 10

Now Sysdig keeps listing processes whose network IO is slower than 10 milliseconds – on my home network, this means the SmokePing probes to the garden Raspberry Pis and some Munin connections.

You can output a list of the processes with the most frequent mass storage accesses by typing:

sysdig -c topprocs_file

The following reveals the entity causing the most network traffic:

sysdig -c topconns

A replacement for top can be found in:

sysdig -c topprocs_cpu

The built-in automatic analysis of bottlenecks is particularly informative. Typing

sysdig -c bottlenecks

generates a list of processes whose syscalls take a suspiciously long time. This is a great approach to searching for bottlenecks.

Depth on the Interface

If you like a more interactive approach, try csysdig. The tool displays the information provided by Sysdig in a continuously updated ncurses interface. Called without parameters, the start screen reminds one of htop, but pressing F2 takes you to a list of Views that correspond to the categories to which Sysdig assigns its chisels, and you can access them quickly and easily.

For example, if you choose the Spectrogram-File view, you are treated to a graphic like that shown in Figure 1: It shows the file access latency distribution, in which each line represents one second. At the time of grabbing the screenshot, an apt dist-upgrade was running, hence the high read and write load highlighted in red.

Figure 1: Abstract art? Nope: Concrete latencies during file access as a spectrogram, applied over time.

The Views overview showcases one of the specialities of Sysdig and Csysdig: You can restrict analyses to applications that run in containerized systems such as Docker or Kubernetes. Thus, admins can quickly and easily identify any performance fluctuations in containerized software.

My conclusions: Used only as a replacement for top and netstat, Sysdig is like taking a sledgehammer to crack a nut, but the many easily parameterized analyses of file and network latencies are a real help. If I have to dig down into individual syscalls, I can save a trace file and filter it until I find what I want. Here, at last, you can finally see the signature of the Wireshark makers.


The Author

Charly Kühnast manages Unix systems in a data center in the Lower Rhine region of Germany. His responsibilities include ensuring the security and availability of firewalls and the DMZ.

