Thursday, 21 April 2016

Free Port Monitoring Service

One of my friends and I have built a service based on the recent posts of the blog on visualizing computer's port distribution and monitoring the usage. It is like the tools we already had here basically, but with a better UI, easier to use and some more information. I do not know why, but the service is named "Puffin"

Clear as crystal
One of the problems with installing a client that sends data to a server on the cloud is that you usually don't know what information it sends up. Same issue happens even when the software is not supposed to do anything with the Internet. What we have done in Puffin, is using simple shell scripts to send data to the Puffin's back-end service. So you can see the inside of the script with any text editor and make sure what information it sends to the server.

Dashboard page of the service

Who is this service for?
Everybody who is curious or wants to know what his/her computer does while connecting to a network or the Internet. Mostly computer students or geeks, network administrators, technical supports or those who does not trust to installed software and want to know what the installed software are doing with the Internet or network connection. If you are a computer, software, network, ... geek you don't need to read the rest of the post; test it here: http://sleptons.tools

Wednesday, 13 April 2016

Bias and Variance in modeling

It is alway important to remember why we do classification. We do it because we want to build a general model to support our problem, not to model the given training datasets only. Sometimes when you finish training the system and look at your model; you see not all of your training data fits in the model, it does not necessarily means that your model is wrong. You can also find many other examples and cases that the model fits the training data very well but not the test data.

Which one of these three models describes the pattern of
the given training set better? 

Besides, never forget, we use modeling the data because we do not know what exactly happens in the system. We do it because we cannot scientifically and mathematically write a formula to describe the system we are observing. So we should not expect our model completely describes the system, why? Because we have modeled the system just by a small fraction of dataset space.

Friday, 8 April 2016

Entropy pattern of network traffic usage

I am working on a new project; it is about recognizing usage patterns in network traffic by having as minimum as possible data. Usually, you need to do DPI or look for signatures, ... These are good and all work well, but the problem is they require access to low-level traffic (not everybody likes to give low-level traffic for analysis) and also are very CPU intensive, especially when you are dealing with large volume of traffic. I think using these techniques are like recognizing your friend by his fingerprint or DNA! However, you, as a human being can recognize your friend, even after 10 or 20 years without these techniques!

Entropy distribution of Port and IP of the TCP connections

We never do a precise and accurate measurement to recognize a friend or a pattern or an object. We usually do some rough calculation or comparison but over many features. When you do DNA analysis to identify people, you can consider it as a precise and accurate measurement while looking to your friend's face, body, hair, and even his clothing style are just simple comparisons and summarizing the results to get the required information you need, is he your friend or not?