Performance analysis with iostat, sar, Ksar, collectd, and serverstats

Performance at a Glance

© Lead Image © Martin Blech, Fotolia.com

© Lead Image © Martin Blech, Fotolia.com

Article from Issue 180/2015
Author(s):

We describe five tools you can use to monitor and troubleshoot your system's performance.

Sooner or later, most server administrators realize they need to do some performance analysis. Whether you hope to detect bottlenecks or plan resources, reliable performance data is essential for any well-managed network. Also, historical records showing performance over an extended period makes it easier to forecast and adapt to changes.

The most important part of measuring performance values is that you don't start measuring right away. To begin, you need to consider carefully what information you want to obtain. Here are five suggestions for what is important during a performance analysis:

  1. Document your observations and results. You must allocate presumptions and conclusions based on your data.
  2. Set up a baseline of performance for your systems. This baseline is a defined initial state that you can use to qualify measurements.
  3. Measure the state of the system during the analysis before making a change. All data you collect for a defined system will help.
  4. Locate the bottlenecks in your system. Which resources limit performance?
  5. Make realistic assessments regarding which measures improve performance, and restrict yourself to these parts when optimizing.

In this article, I describe some tools you can use to monitor and troubleshoot system performance.

Iostat

Data for input and output operations are important in so far as the I/O system affects overall system performance in the case of bottlenecks. The parameter iowait is used as an example here; it is the time the CPU waits for I/O requests to be processed. Large I/O latencies can be a cause of high load on the system; but, they do not always need to be.

Iostat is part of the sysstat package that appears in the Ubuntu repositories as "System Performance Tools for Linux." Other programs included in this package are sar – which is described later – pidstat, mpstat, nfsiostat, and cifsiostat. The sysstat programs are consistent front ends to the /proc filesystem of the Linux kernel. You cannot therefore provide any more statistics than the kernel offers for /proc.

Iostat also provides the option -d for querying device statistics. The tool delivers the CPU utilization report using the -c option. Keep in mind that the first output by default contains the values since the system start. All other reports refer to the interval between individual outputs. If desired, you can skip the first report using the option -y and receive only the current values.

Performance Indicators

Advanced output using -x gives a parameter which, at first glance, is a factor for the utilization of devices for many users: %util (Listing 1).

Listing 1

iostat

$ iostat -y -x -d 3 5
Linux 3.13.0-37-generic (X220)     2014-10-17     _x86_64_     (4 CPU)
Device:  rrqm/s  wrqm/s  r/s   w/s       rkB/s  wkB/s     avgrq-sz  avgqu-sz  await  r_await  w_await  svctm  %util
sda      0.00    2.00    0.00  15520.67  0.00   62090.67  8.00      0.48      0.03   0.00     0.03     0.03   48.00

A look at the man page, however, corrects the common mistaken belief that %util indicates the actual device usage: %util is the percentage of CPU time in which I/O requests were issued to the device. It may be true for individual traditional hard drives that a higher %util value is an indicator of good device utilization. However, RAID devices or SSDs also process several requests in parallel; the CPU time spent for the device is therefore put into perspective.

Better indicators of correct I/O utilization are %iowait in the CPU report and avgqu-sz and await in the disk report. %iowait is the percentage of CPU idle time during which I/O requests were still on the system. However, another pitfall is waiting here: %iowait is a special form of idle time and thus a sub-parameter of the actual CPU idle time. If a task that lets %iowait rise and a CPU-intensive task are running on a CPU core, iowait drops automatically. This situation arises because the second task consumes CPU time and there is thus no longer any idle time.

Listing 2 shows a simple experiment. The corresponding output from Top and iostat bring the idle times to light (Listing 3). If a computationally intensive task is running simultaneously with an I/O task, %iowait is zero, but %util has risen (Listing 4). As previously mentioned, the CPU-related idle times, as well as the values avgqu-sz and await proximal to the device are significant.

Listing 2

Task Set Experiment

# taskset 1 fio --rw=randwrite --name=test --filename=test.fio --size=100M --direct=1 --bs=4k \
  --numjobs=1 --iodepth=64 --group_reporting --runtime=120 --time_based --refill_buffers
# taskset 1 sh -c "while true; do true; done"

Listing 3

Idle Processes

%Cpu0 :  77.7 us,  22.3 sy,  0.0 ni,   0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1 :   1.4 us,   0.7 sy,  0.0 ni,  98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2 :   7.3 us,   1.6 sy,  0.0 ni,  89.9 id,  0.0 wa,  0.0 hi,  1.2 si,  0.0 st
%Cpu3 :   4.7 us,   1.0 sy,  0.0 ni,  94.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Listing 4

CPU-Intensive Task and I/O Task

avg-cpu:    %user     %nice    %system    %iowait    %steal    %idle
            20.50     0.00     7.30       0.00       0.00      72.20
Device:  rrqm/s   wrqm/s  r/s    w/s       rkB/s  wkB/s       avgrq-sz  avgqu-sz  await  r_await   w_await   svctm   %util
sda      0.00     2.67    0.00   32549.67  0.00   130237.33   8.00      0.78      0.02   0.00      0.02      0.02    78.40

Iostat is a tool that provides statistics on the block layer level. Requests are often applied in this context. They are structures that the Linux I/O scheduler works with. It passes them on to the device driver dispatch queue, which presents the last step in the Linux I/O chain.

The value avgqu-sz denotes the average length of the queue that was passed to the device. From time_in_queue and the iostat interval time_in_queue, it calculates the time in which requests had to wait for the device. If several in_flight requests have to wait, time_in_queue increases by the product of time_in_queue and the number of in_flight requests [1]. Finally, avgqu-sz can be calculated from the equation [2]:

(delta[time_in_queue]/interval)/1000.0

A simple example using five in_flight requests in 20ms with an iostat interval of five seconds would result in an avgqu-sz of 0.02. The following therefore applies for a device: The larger avgqu-sz is with a constant await time, the more requests it was able to handle.

The await value is in turn the average time spent waiting for an I/O request:

delta(read_ticks+write_ticks)/delta(read_IOs+write_IOs)

The product rule for several in_flight requests as discussed previously also applies for the ticks used here. It is possible to identify from the derivation of the two parameters that they are suitable as indicators for device utilization. Large await times with small avgu-sz are an indicator that it is necessary to wait for the requests to be processed by the device.

Using plotiostat

You can create graphs from the iostat output using detours. To this end, note the iostat calls in a logfile over a longer period:

iostat -xym 1 > iostat.log

It is best to run this command on a remote server in a screen session or using nohup. Then, the command will not abort even if the SSH connection ends. A simple Python script called plotiostat.py3 [3] then generates a graphic from the logfile:

python plotiostat.py -f iostat.log -d sda -c rMB/s -c wMB/s

Finally, note that iostat provides figures at the block layer level. Performing an analysis at the application level is best done using tools such as iotop or atop, which display who is currently generating an I/O load at the process level in the style of the well-known program Top.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Sysstat

    The Sysstat tools, featuring sar, iostat, mpstat, and pidstat, acquire system parameters and calculate statistics.

  • Tutorials – Collectd

    The collectd tool harvests your system stats and stores them for plotting into colorful graphs.

  • Perl: Nagios Plugins

    You can build a plugin in Perl to harness the power of the Nagios monitoring tool.

  • Performance Tuning Toolbox

    Tune up your systems and search out bottlenecks with these handy performance tools.

  • Tool Tips

    Tool review: Dialog 0.7, virtenv 0.8.6, collectd 5.4.0, convmv 1.15, Drukkar 1.11, and ngIRCd 20.3.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia