Analyzing network flow records

Data Security

The method of detecting downloads directly from the metadata presented here obviously has a considerable effect on the general security of the devices on your network. Many updates are not intended for all versions of an operating system, so you can relatively quickly find out exactly what version is in use. Various manufacturers of security software offer countermeasures that prevent complete scanning of the Internet or your own subnet.

But because metadata is passive by nature, it cannot be filtered by these protection systems – and it is also impossible to get rid of metadata. This method is far less suitable for use on large networks because, on the one hand, the detection rate drops severely the farther away you are from the download server. On the other hand, the alternative approaches that many providers now use to deliver updates also takes its toll.

In particular, the peer-to-peer update function in Windows 10, as well as Content Delivery Networks (CDNs), which manage a globally distributed network with a correspondingly large number of different IP addresses, should be mentioned. These large providers in particular offer so many files of different types that identification based solely on the size of the file seems to be fairly meaningless. Without a knowledge of the DNS requests, it is impossible – especially in the case of CDNs – to identify the domain that was originally the target of the request.

That said, it is important to note that the general lack of attention paid to metadata can become problematic if worst comes to worst and an attacker is able to exploit an unpatched vulnerability on a system footprinted using this method.

Reference Downloads

To be able to assign flows to downloads, it is important to know what possible downloads exist. This task is impossible to handle manually because of the sheer volume of possibilities, so you need to think about an automation strategy. Two methods turn out to be very useful here.

The first method is based on grabbers, which work much like the classical search engine grabbers that index the Internet for fast searching. This involves searching the Internet or a suitable selection of websites for download links. Because the detection rate is very poor in the lower kilobyte range of numbers, administrators will only want to consider downloads whose size exceeds a certain threshold for indexing.

However, this method has the massive disadvantage that it prevents the detection of incremental updates because the download links typically offer the full download. Additionally, the collection can become unmanageably large even after a very short time.

The second method is based on honeypots equipped with a software configuration similar to those used on enterprise networks. By monitoring the network traffic to these honeypots, administrators can now directly observe update sequences. Additionally, it is possible to start downloads directly from the systems, making it easy to map the flows because the honeypot systems are not used for any other purpose.

The major advantage offered by this method is that the recorded packet sizes lead to good detection rates, especially if the honeypots are located on the same subnet as the systems you want to protect. Moreover, it is easier to emulate and analyze special update mechanisms. These benefits come at a price, in that you can only monitor known software versions and combinations and you are relying on honeypot systems that need to work with full, licensed versions of the software you deploy.


For IT staff who want to keep track of their own IT infrastructure and do not have, or are not allowed to have, access to all of the systems, the method introduced here is an additional option that supplements classical penetration tests to provide better asset protection. It also draws attention to the value of metadata. If you log flow records directly on the switches and backbone routers on your network, you will also ensure that the distance to the systems you are monitoring is not too large, which means that the variance in the monitored download sizes remains manageable.


  1. Shadow IT:
  2. "Managing port scan results with Dr. Portscan" by Wolfgang Hommel, Stefan Metzger, Michael Grabatin, and Felix von Eye, Linux Pro Magazine, issue 155, October 2013, pg. 20,
  3. Bernhard, Andreas, Netzbasierte Erkennung von Systemen und Diensten zur Verbesserung der IT-Sicherheit [Network-Based Detection of Systems and Services to Improve IT Security], Bachelor thesis, Ludwig-Maximilians-University, Munich, March 2014, [in German]
  4. Softflowd:
  5. Flow-tools:
  6. Data retention laws:
  7. Pandas:
  8. Sklearn:

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Packet Telemetry with Host-INT

    Inband Network Telemetry and Host-INT can provide valuable insights on network performance – including information on latency and packet drops.

  • OpenFlow

    The OpenFlow protocol and its surrounding technologies are bringing the promise of SDN to real networks – and it might not be long before you see them on your real network.

  • Skydive

    If you don't speak fluent Ethernet, it sometimes helps to get a graphical view of what your network is doing. Skydive offers visual insights that could reveal complex error patterns.

  • Argus

    Argus helps you monitor the flow of data on your network, detect trends, discover worms and viruses, and analyze bandwidth usage.

  • Web Videos

    We’ll show you how to convert your videos to FLV format and play them from your website with FlowPlayer.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.