Monday 21 March 2016

Visualizing the current pattern of your computer's port usage

The application
I set up a web application which enables you to get the idea of how the applications you are working with on your computer, use the TCP connection ports. You can go to the following link and download the script.

Blog Tools: Port usage class visualizer

Go to the above link and download the script. The script is based on what we have talked about in latest four posts, all it sends to the server is the established connections' remote port and corresponding count. Give the script execution permission (chmod +x train.sh), and it is ready to run on Mac os and Linux. If you have Windows (or IPv6) you need to modify the script. By the way, the resolution for portId is 1000 and for valueRange is 20, this means ports 52321 and 52890 are shown as L52 and the current count of ports between 0 to 19 shown as 1, 20 to 39 as 2 and ...

Open a terminal and execute the following command send 20 training samples to the server and then the script calls a web page to show the graph. The command is like training the first class; the application only lets you have two class.



$ ./train.sh 1 20


Executing it again trains the service with another 20 samples for the same class. If you want to clear, the samples run the following command which clears the all trained data and trains it again with 60 samples.

$ ./train.sh 1 60 0

And it is obvious the following trains the second class:

$ ./train.sh 1 20


My experience: Idle state
I closed all my application and sent 60 training datasets; this was the result.

The ideal pattern of my notebook

When my notebook is in this state, it only uses (L5,1) which means ports in range 5000 to 5999, not more than 20 open port. Unfortunately, it shows Mac OS X always has some up and running connection to someplace!


My experience: Web usage
Again I executed the script and started using normal web search and page view, no video or music. Web sites like Google News, Yahoo, Wikipedia, ...


The web surfing pattern.

You see the difference, both HTTP, and HTTPS ports have some usages even in the range of 2,3,4 and 5 while the L52 is still in range 1. That means when I use web surfing; my computer opens many connections to the Internet via the ports 80 and 443. 

If you train a system these two class of usage, then our described calculation can easily recognize these two patterns. Because all of the 80 and 443 usages in Idle pattern gets the minimum probability value, since the pattern does not have these ports, so the idle pattern probability gets lower than the web surfing one.

Vice-versa if you give some idle pattern as the test dataset, the system finds the likelihood of the first class more than the second one. Because the to calculate the conditional probabilities for the second class, the denominators are bigger than the first one.


My experience: Youtube usage
Here is the port usage pattern for opening three concurrent Youtube video and you see here we do not have any HTTP connection, just up to 60 connection of HTTPS. 

Watching Youtube pattern.

Again if you think about the calculation we described, it can distinguish such a pattern among these three defined classes too.


And the last experience: Torrent
Here is my last observation, the pattern of downloading a single torrent file. You see there is a dramatic difference between this observation and the previous ones, so easier for our service to catch the pattern. The torrent protocol uses many different connections (and ports).

Torrent file downloading port usage pattern


Important note
Training a learning machine is not simple it is somehow like training a human child. You can not expect to teach a human child bad stuff and expect them to do good. So whenever you want to start a new set of training, close all the Internet related applications and let the computer closes the connections, give the computer a minute or two, then start training the system with a new situation. And finally calculation of the probabilities for the test dataset given each of the already trained classes shows how much the current state of your computer looks like the classes.

No comments:

Post a Comment