Monday 14 March 2016

Visualizing port usage data

Visual perception is the power of processing information we typically get from our eyes. Why typically? Because even if we cover our eyes, we still have the ability to get visual perception by touching objects too, this shows the loose dependency between this power and our eyes as sensors. In fact, as I have said it before, we see with our brain, not with our eyes, and since that process which visualizes our environment gets trained continuously during our life, it is one of our most powerful understanding tools.

OK, to continue our previous talk, run the "netstat" command, it should show something like this:

$ netstat -an | grep "ESTABLISHED" | awk '{print $5}'  | awk -F. '{print $NF}' |  sort -n | uniq -c | awk '{print $2 ":" $1}'
80:1
443:24
5223:2
5223:1
5228:2


Run it again when you are using another application with the Internet usage, like downloading a file and see the result:

$ netstat -an | grep "ESTABLISHED" | awk '{print $5}'  | awk -F. '{print $NF}' |  sort -n | uniq -c | awk '{print $2 ":" $1}'
80:1
443:23
5223:2
43212:3
51443:12
52102:9
53287:4


Looking at these recent two sets of numbers, they do not give us much information, especially if we want to find a pattern in them. To use our visual perception power let us draw a bar chart with horizontal axis as port number and the vertical axis as the count of established connections. Note that the maximum value in the horizontal axis is 65535 which is the maximum allowed port number.

Sample #1, port usage spectrum

Sample #2, port usage spectrum

Now when you look at the above spectrums, your visual perception power helps you to have a better understanding and comparison between the given datasets.

Note that you can use other ideas to draw some fancy graphs too. However, I think the spectrum gives better engineering meaning to the data. There is also another method to visualize this dataset, which assumes each dataset as a vector or point in a 65536-dimensional space. In this space, each dimension is a port number, and the usage of that port on your machine is the usage count. This model of visualization works better when you focus on classification or clustering the data. The problem is drawing a graph for a dataset with more than 3 ports is meaningless.

A tool to draw your port usage spectrum
To continue our discussion we need to have a tool to visualize this kind of dataset. Here I prepared one, which shows the spectrum we talked for the time of running a script. Go to the bellow link, download the script file, add execution permission to it and run it:


What this shell script does is just sending the result of the previous "netstat" to a d3.js graph. As we discussed on the tool page, the script works for IPv4 on Mac and Linux (On linux you just have to change the "-F." to "-F:"), and you need to modify it if you want to run it on a Windows machine or a computer with IPv6. Here are some notes on using the script:
  1. It only sends your current remote ports and their associated number of established connections. You can run the "net state" command in the script to see what information you send by running the script. So it includes no personal information. I hate running applications having a connection to their host, and you do not know what they exactly do.
  2. When you saw the spectrum, refreshing the browser will not update the spectrum; because the remote service cannot execute the script on your computer. So if you want to update the data and see a new status, you need to run the script again.

No comments:

Post a Comment