High-resolution network monitoring with ping
Stopwatch Timing
The Pinger network monitoring tool uses ping to look for switches and estimate cable lengths.
The ping
command is used to determine whether a particular host on the network is accessible and to reveal the packet turnaround time, usually known as the round trip time (RTT). The RTT of a ping request is longer when packets need to pass through network devices or long stretches of wire. In this article, I develop a utility that uses the ping RTT to track down switches and transparent bridges and determine cable lengths.
Known Pings
Common ping programs under Linux, like that from the iputils package [1], create RTT statistics with a mean value. However, the average of thousands of pings can vary so greatly that it is impossible to achieve high resolution within the framework of a few microseconds to nanoseconds.
These subtleties, however, are interesting when exploring the network and the equipment in it. Expert evaluation, that is, filtering out RTT outliers before computing an average, can return a resolution of less than one microsecond (1µs). Several available ping programs offer additional features that remain mostly unused, such as the ability to send a bit pattern in the ping packet to determine data rot (i.e., damage to data on the network).
In this article, I use the classic ICMP ping on IPv4, but most of the principles are also supported by alternative ping tools, such as arping [2], httping [3] and ipmiping [4] – in fact, by anything that gives you an RTT (e.g., a wget download of a small file or reading the USB register of a USB adapter).
Troublemakers
Ping on a gigabit LAN mostly delivers RTT measurements with a Gaussian distribution of typically around 200µs; however, outliers with much higher values shift the mean. Listing 1 shows an example with only one outlier in 100 measured values. It results from pinging between two four-year-old PCs connected by a gigabit switch: one running Debian with kernel 3.1.0-1-amd64 and the other running Ubuntu with kernel version 3.5.0-18-generic.
Listing 1
Ping via Gigabit Switch
A single outlier here shifts the mean value avg by more than 1,000 percent, from approximately 0.2 to 5ms. As several measurements show, the RTT values still vary greatly if you increase the number of pings (n) to several thousand. This happens because the outliers are not only relatively large, but also greatly scattered, causing the mean value to vary by approximately 50 percent, even for n = 86,400.
These extremely large outliers are equivalent to a pendulum with a period of approximately 1 second – and that really does take about one second per beat 99 percent of the time – occasionally slowing down so drastically that one beat takes a whole hour or longer.
Through the High Pass
The disproportional effect of outlying values throws off any calculation that makes assumptions about the state of the network on the basis of ping response times. The first step is to remove these extreme values by developing some sort of interference filter.
The usual approach, as many of you will recall from physics problems, is to assume a Gaussian distribution of the RTT measurements as a first approximation and discard those values that deviate from the mean value without outliers by more than 3-sigma (i.e., three standard deviations from the mean). Because no outliers fall in the downward direction, you only need to remove outliers in the upward direction, which calls for high-pass filtering.
The ping
command from the iputils package returns sigma in the mdev
field, so just a few measurements of a few dozen pings, each without outliers, are sufficient to determine sigma at the command line.
The sigma for the gigabit LAN with one switch connected to several computers turned out to be 45µs. Thus, filtering values above 200+3*45 (i.e., 335µs) is the way to go. In practice, you might want to round up to be safe; you will still filter out most of the outliers that typically lie in the range of 100ms to several seconds. In this example, 400µs would be a good choice – that is, twice the mean value without outliers.
You can also determine this cut-off value automatically, by using outlier tests or by determining the value with the highest RTT density, and simply multiplying it by 2. This is a task that a program can do for you automatically if you allow a brief warm-up period before the actual measurement.
After this noise filtering, you can see that the average error of the mean value is approximately sigma divided by the square root of n, as you would expect for a Gaussian distribution. This means that the mean error of the mean value decreases as n increases; for n = 100, by a factor of 10; for n = 10^6, by a factor of 1,000; and so on.
In terms of the RTT in the gigabit LAN under investigation, this means the standard deviation is 4.5µs, or 45ns. The measurement thus has sufficiently high resolution to detect an intermediate switch or a longer or shorter network cable.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.