Python network data visualization

Data Harvest

© Lead Image © Mark Bridger, 123RF.com

© Lead Image © Mark Bridger, 123RF.com

Author(s):

The Scapy packet manipulation program lets you analyze and manipulate packets to create incident response reports or examine network security.

Most folks have pulled up Wireshark a time or two to troubleshoot an application or system problem. During forensics, packet captures (PCAPs) are essential. Often you are looking at things like top talkers, ports, bytes, DNS lookups, and so on. Why not automate this process with Python?

Scapy [1] is a great tool suite for packet analysis and manipulation. It is most often talked about in the realm of packet manipulation, but its ability to analyze packets is also top-notch.

Make Ready

First, you need to make sure you have Python 3 installed along with the following packages:

sudo pip3 install scapy scapy_http plotly PrettyTable

To get started, you will want a PCAP to analyze. To capture 1,000 packets and save them to the file example.pcap, enter:

~$ sudo tcpdump -c 1000 -w example.pcap
tcpdump: listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
1000 packets captured
1010 packets received by filter
0 packets dropped by kernel
~$

Scapy can handle all parts of the OSI model except Layer 1 (Figure 1). Listing 1 shows the Hello World! of packet reading. To begin, you need to read a raw packet (line 5), see if it has the layer your want (line 9), and then act on it. Because you are using Python, if you try to print out pkt[IP].src when no IP is present, Python will throw an error, so you need to wrap it in a try/except (lines 10-13).

Listing 1

Looking for Layers

01 #Step 1: Import scapy
02 from scapy.* import all
03
04 #Step 2: Read the PCAP usimg rdpcap
05 packets = rdpcap("example.pcap")
06
07 #Step 3: Loop and print an IP in a packet in Scapy by looking at Layer 3
08 for pkt in packets:
09   if IP in pkt:
10     try:
11       print(pkt[IP].src) // Source IP
12     except:
13       pass
Figure 1: Check OSI Layer.

Sorting

If you ran the code in Listing 1 with your example.pcap file of 1,000 packets, your terminal printed ~1,000 lines, which is obviously not very useful. To improve, you can read all the IPs, append them to a list, then run a counter, and print the results using the PrettyTable module (Listing 2). As before, you import Scapy, but now you will also import the collection module and PrettyTable (Step 1). Next, add an empty list, and append (Step 2). Now you can use the counter to loop through the list of IPs and create a count (Step 3); finally, using the PrettyTable module, you print out the results in a clean table (Step 4).

Listing 2

Adding a Counter

01 #Step 1: Imports
02 from scapy.all import *
03 from prettytable import PrettyTable
04 from collections import Counter
05
06 #Step 2: Read and Append
07 srcIP=[]
08 for pkt in packets:
09   if IP in pkt:
10     try:
11       srcIP.append(pkt[IP].src)
12     except:
13       pass
14
15 #Step 3: Count
16 cnt=Counter()
17 for ip in srcIP:
18   cnt[ip] += 1
19
20 #Step 4: Table and Print
21 table= PrettyTable(["IP", "Count"])
22 for ip, count in cnt.most_common():
23   table.add_row([ip, count])
24 print(table)
25
26 +-----------------+-------+
27 |        IP       | Count |
28 +-----------------+-------+
29 |    10.0.2.15    |  482  |
30 |   52.84.82.203  |   93  |
31 |     8.8.8.8     |   82  |
32 |   104.16.41.2   |   76  |
33 |  216.58.216.232 |   30  |
34 |  104.20.150.16  |   20  |
35 |  52.84.133.105  |   16  |
36 |  209.132.181.15 |   16  |
37 | 140.211.169.196 |   15  |
38 |   72.21.91.29   |   12  |
39 |  104.244.46.103 |   12  |
40 +-----------------+-------+

Visualize

Now that you know how to read packets and do some counting, you can use the Plotly package to make graphs by building on the last example (Listing 3). First, you have to add the plotly import to Step 1 (line 1); then, after going through Steps 2 and 3 as before, you replace Step 4 in the previous example of Listing 2 with new code that creates two new lists to hold x and y data (Listing 3, lines 4-5) and loops through the IPs again, adding them to the lists (lines 7-9).

Listing 3

Making Graphs

01 import plotly
02
03 #Step 4: Add Lists
04 xData=[]
05 yData=[]
06
07 for ip, count in cnt.most_common():
08   xData.append(ip)
09   yData.append(count)
10
11 #Step 5: Plot
12 plotly.offline.plot({
13   "data":[plotly.graph_objs.Bar(x=xData, y=yData)] })

By default, Plotly uses its web UI to create charts, but if, like me, you use this data in a incident response situation, you do not want to share that data with a cloud system. Therefore, I use the offline version to plot my data in a new Step 5. When run, it will open your default web browser (Figure 2).

Figure 2: Offline plot of IPs.

DNS Data

If you modify the previous code slightly, you can print DNS lookups. Instead of pkt[IP].src, you use pkt.haslayer(DNS). Again, you create an empty list and append to it; then use Scapy to check for DNS and affirm that the packet is a query (with   as the QR type) and not a response, which would have a 1 in the QR field. (Listing 4). Again, count and print (Figure 3).

Listing 4

DNS Lookups

01 from scapy.all import *
02 from collections import Counter
03 import plotly
04
05 packets = rdpcap("example.pcap")
06
07 lookups=[]
08 for pkt in packets:
09   if IP in pkt:
10     try:
11       if pkt.haslayer(DNS) and pkt.getlayer(DNS).qr == 0:
12         lookup=(pkt.getlayer(DNS).qd.qname).decode("utf-8")
13         lookups.append(lookup)
14     except:
15       pass
16
17 cnt=Counter()
18 for lookup in lookups:
19   cnt[lookup] += 1
20
21 xData=[]
22 yData=[]
23
24 for lookup, count in cnt.most_common():
25   xData.append(lookup)
26   yData.append(count)
27
28 plotly.offline.plot({
29   "data":[plotly.graph_objs.Bar(x=xData, y=yData)] })
Figure 3: Graph of DNS lookups.

Packets Through Time

At first glance, plotting packets over time is an easy problem to solve. Just grab the packet and use pkt[IP].len; however, if you have a reasonable data collection, you will almost always print data of 1500 bytes (the default MTU in most routers), which produces an uninteresting graph. With the pandas Python data analysis library, you can make human-readable dates from the packets, which are in epoch (unix time) and then bin the date and time. (Listing 5). First, you have to install pandas:

sudo pip3 install pandas

Listing 5

Using the pandas Library

01 from scapy.all import *
02 import plotly
03 from datetime import datetime
04 import pandas as pd
05
06 #Read the packets from file
07 packets = rdpcap('example.pcap')
08
09 #Lists to hold packet info
10 pktBytes=[]
11 pktTimes=[]
12
13 #Read each packet and append to the lists.
14 for pkt in packets:
15   if IP in pkt:
16     try:
17       pktBytes.append(pkt[IP].len)
18
19       #First we need to covert Epoch time to a datetime
20       pktTime=datetime.fromtimestamp(pkt.time)
21       #Then convert to a format we like
22       pktTimes.append(pktTime.strftime("%Y-%m-%d %H:%M:%S.%f"))
23
24     except:
25       pass
26
27 #This converts list to series
28 bytes = pd.Series(pktBytes).astype(int)
29
30 #Convert the timestamp list to a pd date_time
31 times = pd.to_datetime(pd.Series(pktTimes).astype(str), errors='coerce')
32
33 #Create the dataframe
34 df = pd.DataFrame({"Bytes": bytes, "Times":times})
35
36 #set the date from a range to an timestamp
37 df = df.set_index('Times')
38
39 #Create a new dataframe of 2 second sums to pass to plotly
40 df2=df.resample('2S').sum()
41 print(df2)
42
43 #Create the graph
44 plotly.offline.plot({
45   "data":[plotly.graph_objs.Scatter(x=df2.index, y=df2['Bytes'])],
46   "layout":plotly.graph_objs.Layout(title="Bytes over Time ",
47     xaxis=dict(title="Time"),
48     yaxis=dict(title="Bytes"))})

As before, you create lists to hold data (lines 10-11) and, this time, store the length of bytes in a packet and the timestamp of the packet. Next, you will get the length of the packet with (pkt[IP].len) and convert the time using datetime (lines 13-25). With the pandas library, you convert the list to a pandas series and then convert to a timestamp, create the pandas dataframe, and organize the data in to two-second bins (lines 21-41). Now you can use Plotly to print the chart. Lines 46-48 add a title with graph_objs.Layout. The time (x) axis was created during resampling, with the y axis data in bytes (Figure 4).

Figure 4: Flow of packets over time.

Conclusion

You can do much more with Scapy, such as grab URLs, pull files from PCAPs, and more; by slightly modifying the examples in this article, you can add more features. The open source PacketExaminer project offers a pre-made harness for PCAP analysis [2], and all of the code in these examples can be found in the training folder of the repo. If you have any questions, just let me know at joe.mcmanus@canonical.com.

Infos

  1. Scapy: https://scapy.net
  2. PacketExaminer project on GitHub: https://github.com/joemcmanus/packetexaminer