Thursday, 21 April 2016

Free Port Monitoring Service

One of my friends and I have built a service based on the recent posts of the blog on visualizing computer's port distribution and monitoring the usage. It is like the tools we already had here basically, but with a better UI, easier to use and some more information. I do not know why, but the service is named "Puffin"

Clear as crystal
One of the problems with installing a client that sends data to a server on the cloud is that you usually don't know what information it sends up. Same issue happens even when the software is not supposed to do anything with the Internet. What we have done in Puffin, is using simple shell scripts to send data to the Puffin's back-end service. So you can see the inside of the script with any text editor and make sure what information it sends to the server.

Dashboard page of the service

Who is this service for?
Everybody who is curious or wants to know what his/her computer does while connecting to a network or the Internet. Mostly computer students or geeks, network administrators, technical supports or those who does not trust to installed software and want to know what the installed software are doing with the Internet or network connection. If you are a computer, software, network, ... geek you don't need to read the rest of the post; test it here: http://sleptons.tools

Wednesday, 13 April 2016

Bias and Variance in modeling

It is alway important to remember why we do classification. We do it because we want to build a general model to support our problem, not to model the given training datasets only. Sometimes when you finish training the system and look at your model; you see not all of your training data fits in the model, it does not necessarily means that your model is wrong. You can also find many other examples and cases that the model fits the training data very well but not the test data.

Which one of these three models describes the pattern of
the given training set better? 

Besides, never forget, we use modeling the data because we do not know what exactly happens in the system. We do it because we cannot scientifically and mathematically write a formula to describe the system we are observing. So we should not expect our model completely describes the system, why? Because we have modeled the system just by a small fraction of dataset space.

Friday, 8 April 2016

Entropy pattern of network traffic usage

I am working on a new project; it is about recognizing usage patterns in network traffic by having as minimum as possible data. Usually, you need to do DPI or look for signatures, ... These are good and all work well, but the problem is they require access to low-level traffic (not everybody likes to give low-level traffic for analysis) and also are very CPU intensive, especially when you are dealing with large volume of traffic. I think using these techniques are like recognizing your friend by his fingerprint or DNA! However, you, as a human being can recognize your friend, even after 10 or 20 years without these techniques!

Entropy distribution of Port and IP of the TCP connections

We never do a precise and accurate measurement to recognize a friend or a pattern or an object. We usually do some rough calculation or comparison but over many features. When you do DNA analysis to identify people, you can consider it as a precise and accurate measurement while looking to your friend's face, body, hair, and even his clothing style are just simple comparisons and summarizing the results to get the required information you need, is he your friend or not?

Sunday, 27 March 2016

Information Gain as a measure of change in port usage distribution

We saw we could use Bayes Theorem and Machine Learning methods to catch changes in your computer's port usage. There is another complementary way we can use to find out any changes in port usage. I say complementary because you cannot strictly judge something wrong is going on if you do not have enough evidence. So the idea is to gather sufficient information to help us to get a sense of the usage pattern.

Port usage distribution
Remember the post "Visualizing port usage data", how we draw a graph for computer's destination ports, here are three separate samples I captured from my computer with provided tool in that post:

Three different port usage spectrum

The question is how can we define a measure to show the difference between the steps we catch a spectrum snapshot or distributions?

Wednesday, 23 March 2016

A tool to monitor your computer's real-time port usage

Screen snapshot of the tool
In recent 5 posts, we talked about how important is data in Machine Language algorithms and introduced a source of data every one of us who uses a computer and the Internet; has access to it. It is port information of our computer's connections to the Internet or the attached network. That is a good source of data because:
  • The port information does not contain data about the target or hosts you work with them, so you do not give us information about your host.
  • It is steady and always available.

We also introduced a simple script which gathers the information and sends them to the server. This script is the base of our data collection, and you can run it on Linux or Mac OS (On linux you just have to change the "-F." to "-F:" and may need some changes to work on Windows too):

netstat -an |  grep "ESTABLISHED"  | awk '{print  $5}'  | awk -F. '{print $NF}' |  sort -n |  uniq -c | awk '{print $2 ":" $1}'


We also showed how you can visualize this information and use some classification methods to learn the way your computer uses ports (spectrum & pattern).

Monday, 21 March 2016

Visualizing the current pattern of your computer's port usage

The application
I set up a web application which enables you to get the idea of how the applications you are working with on your computer, use the TCP connection ports. You can go to the following link and download the script.

Blog Tools: Port usage class visualizer

Go to the above link and download the script. The script is based on what we have talked about in latest four posts, all it sends to the server is the established connections' remote port and corresponding count. Give the script execution permission (chmod +x train.sh), and it is ready to run on Mac os and Linux. If you have Windows (or IPv6) you need to modify the script. By the way, the resolution for portId is 1000 and for valueRange is 20, this means ports 52321 and 52890 are shown as L52 and the current count of ports between 0 to 19 shown as 1, 20 to 39 as 2 and ...

Open a terminal and execute the following command send 20 training samples to the server and then the script calls a web page to show the graph. The command is like training the first class; the application only lets you have two class.

Friday, 18 March 2016

Modeling the port usage data

In the previous posts, we saw that our raw data from your computer comes to our classification program as bellow:

443:47,5223:2,5228:1, ...


Then we apply the following function to get the portId:

function String getPortId(port) {
      if (knownPorts.contains(port) {
            return "R" + String(port);
      } else {
            return "L" + String(Integer(port/100));
      }
}

Wednesday, 16 March 2016

Mapping and changing the resolution of dataset

Human eyes can process 10-15 images per second,
it can handle higher rates too, but will lower precision.
Consider you are sitting in a car, the car is moving on a road, and you are looking through the window. The best you can do is processing 10-15 images per second, so you unquestionably miss much information, and can not get information about many things happen out there.

However, your brain does not let you feel the information gap; it constructs the best reality it can from the given information and the things it has learned through your past life.

That is somehow the reason reality is different from the different observers point of view. Someone may say I saw a tree in the corner of the street while another one may claim to see a wooden street light.

If we believe through the evolution, the human brain does the best it can; we need to (or, at least, can) do the same in Machine Learning (ML) algorithms too. Because the idea of ML itself is from the way our brain works. When you see a bus on the road or a traffic sign your brain does not process the given information as a raw bitmap of a picture, it just works with edges, curves, colors, etc. It uses the features it thinks are best to understand the reality better and faster.

Monday, 14 March 2016

Visualizing port usage data

Visual perception is the power of processing information we typically get from our eyes. Why typically? Because even if we cover our eyes, we still have the ability to get visual perception by touching objects too, this shows the loose dependency between this power and our eyes as sensors. In fact, as I have said it before, we see with our brain, not with our eyes, and since that process which visualizes our environment gets trained continuously during our life, it is one of our most powerful understanding tools.

OK, to continue our previous talk, run the "netstat" command, it should show something like this:

$ netstat -an | grep "ESTABLISHED" | awk '{print $5}'  | awk -F. '{print $NF}' |  sort -n | uniq -c | awk '{print $2 ":" $1}'
80:1
443:24
5223:2
5223:1
5228:2

Saturday, 12 March 2016

The importance of data in Machine Learning


Data for Machine Learning (ML) algorithms is like fuel for a car, no fuel no movement. If you do not have access to enough data, it is better to think about using rule-based systems rather than ML systems. I've seen failure of many ML projects not because of the algorithm or the implementation but mostly because of:

1- Having not enough data to train the system
2- Shallow understanding of the data

You only can expect from an ML system to understand and have a proper response just based on the given data and algorithm nothing more! I always wonder why do people expect ML systems work perfectly based on a tiny dataset they provide for them.

Monday, 7 March 2016

Python script to walk on Markov Chain

Suppose you have a simple system which has only four different states, and your observation shows the system changes its state like the following Markov Chain:

Sample Markov Chain to walk on

We want to write an application to simulate the system's state change based on the above Markov Chain. You can find an answer for one of the frequent questions, "Why do many people use Python in Artificial Intelligence?" here. Because you are going to see how easy we can do this job with just basics knowledge of Python while writing the same application with Java or C++ could be a problem.

Saturday, 5 March 2016

People do sin, no matter who they are!

If you check the meaning of the word "sin" in a dictionary it is something like "transgression of the law of God", I am going to rephrase it as "violation of the society accepted law" to talk about the nature of our decision making or the way our brain works. It is obvious why I'm using the rephrased version, isn't it?

Scenario
Consider a small city in which the wealthiest family of the city invites all the people to a party in their house. People know they always keep a vast amount of money and jewellery in one of the open rooms of the house. There are rumours that they do not even have any idea of the quantity of money or jewellery in that room, so if you take something from that room, they are not going to find it out. There is no CCTV or alarm system as well. Mr X is our subject who suddenly finds himself at the door of the mentioned room, and now we want to analyze the process of his decision making.

Look at the following simple network which shows the current state of Mr X and the options he has to do or thinks about them at the door.

Data Network of Mr X's status and the actions he can choose from.

Wednesday, 2 March 2016

Random walk on graph and collapse of Bayesian Network

Random Walk
Consider the following Bayesian Network or Markov Chain, which shows your walking direction. Every time you want to take a step; the network says you have the equal chance of walking towards the north (n) or south (s).

Two states Bayesian Network with equal probabilities for any
state change 

I've prepared simple graph based on d3.js for 10,000 steps; you can see it by clicking on the following link. The horizontal axis could be considered as time and the vertical one as the direction to the north or the south. There are equal probabilities to take a step toward the north or the south for the given link. Try the redraw button to see what happens each time it redraws it.

10,000 Random Walk / 50% up - 50% down, 100% forward ...


A sample result of the 50/50 random walk.


Monday, 29 February 2016

Introduction to reducing uncertainty of outcomes and predictive modeling

There are some ways you can predict the future or next state of a system. In most of them, you build a statistical model from the information you have and the try to predict the next outcome of the system, although you can go forward and make another prediction; but as you move the uncertainty of the outcome increases. The simplest we can say is that what will be the chance of getting tail when we toss a coin, considering the coin is fair? We all know the answer is 50%.

We assume the system we are trying to predict its next state is not 100% uncertain. Look the bellow series. The first sounds like tossing a fair coin while the second doesn't.

A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,?      [1]

A,A,A,A,A,B,B,A,A,A,B,A,A,A,A,A,B,A,A,A,?     [2]



If someone asks you what will be the next character in series [1]? You easily reply A. Why? Because it shows that the number of As and Bs are equal and after each A you have a B and after each B you have an A. In fact, from the given data in [1], we see no randomness or uncertainty in the sequence, like the following:

A model for series [1]
In any state you are, the next state will be the other one.

But what about the series [2]? It is not that much easy. You need to build a model to answer this question.

Tuesday, 23 February 2016

Dealing with extremely small probabilities in Bayesian Modeling

If you have ever tried to model a complex system with Bayesian Network or have an attempted to calculate the joint probability of many outcomes you know what I'm saying! Multiplication of many small numbers which eventually gives you nothing but zero!? To have a better idea of what the problem is, let us consider a classic text classification problem.

The source code of this post is located here:
Download the document classification Java code, using Bayes Theorem

Document Classification
Suppose you have some already classified documents and want to build a classification program, here is what your training dataset should look like:

(d11,c1), (d12,c1), (d12,c1), ...
(d21,c2), (d22,c2), (d22,c2), ...
.
.
.
(dn1,cn), (dn2,cn), (dn3,cn), ...

We give the labeled documents as training datasets to our program, it learns them and then it should determine the class of any given document.

Monday, 22 February 2016

The simple Mathematics behind object catching

I always ask myself, do we do some complex Mathematics in our brain when we want to recognize an object or make a decision? Or do we solve tens of time-related multidimensional partial differential equations to cache a thrown object?

My answer, we don't. Our brain is a network of hundreds of billions of neurons, how could this system work based on some brain-made models and abstraction like Mathematics, which we the creator of these formulas still have problems solving them? Have you ever tried to model the "Object Catching" completely?

What I'm attempting to say is that our brain uses very simple mathematics to do things, like recognizing objects, making a decision or catching a thrown object. I don't say these are simple tasks our brain do; I just say the Mathematics and model behind all these tasks are simple.
In fact, what makes our brain does these incredible works is the simplicity of the way it works, not the complexity of it.

A sophisticated tool or solution works well for particular situations or problems while a simple one could work for different kinds of problems or situations. (Think about a knife; you can do many things with a knife while with something like a Phillips or Torx screwdriver you can't. You even can turn a Phillips or Torx screw with a knife.)

All we do to catch a thrown object is repeatedly doing some linear position approximation
and moving or adjusting the body or the hands based on already learned patterns.

Consider the "Object caching" process which is one of my favorites processes to think about; whenever I see a dog catches a thrown object, I give one more vote to the idea that there should be a simple Mathematics behind this process, which even a plant does it, they grow the light!

Thursday, 18 February 2016

Canadian dollar, 3 months forecast

I used daily values of the following information from "Bank of Canada" and "Federal Reserve Bank of St. Louis." for ten years to build a model to predict the behavior of the Canadian dollar in next three months.

- US dollar rate
- Euro rate
- Yuan rate
- Oil price

Although there are some reasons I chose them, in these kinds of economic analysis especially in Machine Learning methods the more data you have, the better prediction you can have, and the above data were the only easy digestible I found. Before anything else let me say that:
The calculation shows it is more likely to get better than worse.


The CAD rate prediction for next 100 days

OK, I build a full mesh Bayesian Network for all dimensions. The model's time resolution is the month, and for the rate changes I used only three choices "up", "down" and "no change".

Wednesday, 17 February 2016

Two-Dimensional Bayesian Network Classifier

I posted a simple time series pattern recognition ten days ago, and I didn't  expect readers may come, download and test it. I'm happy it happened and since some of them have asked to prepare something to show the result, here I put a server and client application to let you send your training and test data to the server and it displays the result.

The server side name is BNTSC1 it is for "Bayesian Network Time Series Classifier" but since it can be used for any two-dimensional data, I preferred to put that as the title of the post. You can download the server and client side from the following link:

Download server and client applications from here ...

And the online version of the tool is here:

Online Bayesian Network Time Series Classifier

Server side
The server side is a simple Java servlet application, containing two individual servlets. the "Home" class is a servlet which reads the sent data and displays them on the screen and the "Data" servlet which you can config the classifier and send training or test data to it.

Monday, 15 February 2016

The role of patterns in human habit

1- Introduction
Once in a while, we decide to get used to a new habit or forget a one, like doing morning exercise, not drinking a sugary beverage, studying every night, learning to play or sing a piece of music or song, etc. You can look at these habits as some patterns or sequences of doing specific tasks. Even when you get used to a habit, whenever you repeat it, you may change some parts of it or apply some improvement to it and after a while at some point, don't need to think consciously about what you are doing.

Patterns also do exist all around us in the physical world. Try to imagine the world at a small atomic scale, at this magnitude all you have is a collection of billions of billions of billions ... of electrons, neutrons and protons.

Friday, 12 February 2016

A tool for time series pattern recognition

I modified previously provided tools to build a new one which gets both training and test datasets of a time series to see how much the given test dataset satisfies the learnt pattern (or simply validates the given test set). If you need to get more information on Bayesian Network, the following link shows almost all I've written on this topic.

Bayesian Network posts

The tool is at the following address in the blog tools section:

Simple time series pattern recognition

You may like to test it with the default provided data first, but we are going to describe how it works and how you can use it. So forget about the default data in the text areas, copy and paste the following data in first text area training and let the second one be empty:

Training dataset 1
100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,74,82,

The training data is a single-period sinusoidal waveform. If you draw it with the tool, you'll see the following graph.

Single period of the training data

Since real time series are usually not the same when they appear in their several instances or periods, add the following two training datasets to the training text area and draw the graph. You need to put a new line for each training period of data.

Training dataset 2
83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,

Training dataset 3
66,74,83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,

The result should be like the following, here the training datasets just have a phase shift, but you can use any real training set you have. You can also see the live result by clicking here.

Three periods of training data

Now it is the time to ask our tool to see how much a test data could be valid according to the trained datasets, so copy and paste the following test data into the second text area and draw the result.

Test dataset 1
74,83,91,160,109,140,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,100,148,141,134,126,100,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,

As you see the tool draws the test data with gray color, and if the given point satisfies the trained pattern, the tool validates and shows it with green color and if not the tool shows it in red color. You can also put your mouse over the test data points and see the validation probability respect to the trained pattern at any point. Click here or on the image to see a live version if you didn't copy and paste the data.

Three training dataset and one test dataset

How does it work?
We have talked about it before; the short story is that the tool builds a Bayesian Network from the given training datasets. To convert datasets to nodes and edges in the network, it lowers the resolution of the data for both time and value of the series and connects time nodes to corresponding value nodes, increases if the connection already exists, then it calculates probabilities. Here is the network the tool generates for the above test.


Bayesian Network for the above example

How can we say the test dataset matches the trained network?
It is all up to you, the tool gives you the probabilities, you can assume having one zero probability means the test data doesn't match the trained pattern. You can have some tolerance and say having 90% none zero probability is OK. Or even you may say I only accept the points with a probability of higher than 0.25 so if you have one lower than 0.25 you should reject or invalidate the test data.

Tuesday, 9 February 2016

Expected Value and Learning Strategy

Suppose you need to train a machine with (or learn from) an online stream of data and the given data is not tagged, so you don't have any idea of if the recent data complies your classification or not. To continue talking about this problem let's consider a simple scenario in which we monitor only one variable of a system. For example, the temperature of a room, traffic volume of a network,  stock market return rate, etc.

To build a simple model for our problem, we start at time zero with the value of V(0), the next sample at time one will be V(1), etc. Like the following, don't forget the data is not labeled so we only have the current value:

V(0), V(1), V(2), ... V(n) 

Studying and learning a phenomenon is about gathering information and building a model to have a better expectation of its future. Even when you keep a hard-copy history on something you just want to be able to study it whenever you like to improve your model and expectation accuracy. And in our simple single variable example, it is about finding a value which shows the most expected value of the variable with one condition, you are only allowed to keep or store the expected value, nothing else.

Sunday, 7 February 2016

Simple Time Series Pattern Recognition Source Code

If you've read the posts about Bayesian Network, you are now ready to write your first pattern recognition system based on a Bayesian Network. Since I'm working with Java in these days, I prepared a simple Java version, let's see how it works. You can find the source code in the following page:

Simple Time Series Pattern Recognition Source Code

Terminology and setters
The main class which does the job is "SimpleTimeSeries". The terminology is that we have some raw time and value which are those you need the system learns their pattern. We also have model time and value; these are the parameters our Bayesian Network uses them as the nodes.

There are also some setter methods which set the crucial parameters of the system like maximum raw time (maxRawTime) which is the largest possible time value. For example, if your single period of data has 720 sample, you need to set it to 719 because we assume raw time for a single period starts from 0 and ends to 719. There is another setter for time series raw value (maxRawValue), if your series value goes up to 5000, set it to 5000.

Thursday, 4 February 2016

Canadian dollar exchange rate analysis (2007-2015)

This post is about how you can have a scientific approach to thinking or analyzing different phenomena using Bayesian Network. The only tools I used to prepare information were OpenOffice's Spreadsheet and the tools I have provided for this blog. I was looking for some data source on the web and finally found the Canadian to US dollars exchange rate in last nine years in Bank of Canada website. If you just draw the rate for 2007 to 2015, you'll have something like the following graph.

Canadian to US dollar exchange rate (data source: Bank of Canada)

Did you note the rates between September and December of 2007?!

Monday, 1 February 2016

A tool to build Bayesian Network from a time series

I showed in the last post (How to convert "Time Series" to "Bayesian Network") how we can convert a sample time series to a Bayesian Network as you may have seen it is easy, but there are some things we have to talk about them a bit more.

1- You do not need to be worried about the maximum value of the Y axis. Whatever it is 10, 1000, 100K, 10G, ... your resolution of processing depends on the overall form of the series you are looking or processing, not these measures or units. What I mean is that a series that shows a network traffic with a maximum of 10G in one day can be considered a series showing the body temperature of a human! We don't care what it is about or in what unit is its Y or X (time) axes.

2- Like the Y axis, the length of X axis or resolution of time-period doesn't matter too, an hour, a day, a month, whatever it is, we only deal with some low-resolution measurement for this axis. For example, in our daily network traffic, we may choose a resolution of an hour to build our time or X nodes. So we only have 24 nodes of type time, each representing a specific hour in a day.

Wednesday, 27 January 2016

How to convert "Time Series" to "Bayesian Network"

I already talked about time series in "Time series as a point in space" post and also discussed a way we can find its anomalies in "A simple way of calculating anomaly" post. But since I've been talking in recent posts about building Bayesian Network to learn existing patterns in information, here I want to show you how easily we can model a time series data with a Bayesian Network.

The process of converting a time series data to a network is nothing rather than the way we get information from a series when we look at it and build its pattern in our mind.

Look at the following time series, when you look at it, you don't need to know what this series is representing, or what are the unit of the Y or T axis. All gets your attention is the up and downs of the series through the time.

Sample time series

So what you basically do in your mind is the following steps:

1- Building your imaginary Y and T axes with large units, like what you see in the picture.
2- Building a network in your mind which shows at your imaginary times, the possible features of the series.

Sunday, 24 January 2016

Using Bayesian Network to learn word sequence patterns

We want to see how the stuff we talked about before (in the following posts) can help us to learn patterns. The example is about learning the way we use words in simple sentences; the sentences we usually use in our IM, chat, or the way babies start to build sentences. Here are the posts you may need to read them before if you are not familiar with Bayesian Network:

Using probability to predict state of a system
Introduction to Transition matrix and Superposition of states
More on Transition matrix and Superposition of states

How your cell phone learns your style of writing
You get a new cell phone and start using its messaging system. You enter sentences when you text your friends, and eventually see the cell phone starts suggesting words or guessing what you are going to type. The more you use the phone, the better its guesses will be.

Wednesday, 20 January 2016

More on Transition matrix and Superposition of states

The example in the previous post was about a very simple 2 states system, now let us consider a bit more complicated system which has 3 states. Things we are going to talk about or calculation we are going to do here for this 3 states system could be easily extended for a system with more than 3 states too. Now look at the following network which can represent any possible network graph of state transitions for a system with 3 states.


General state transition network for a 3 states system.

Here you see we have the option of going from any state to any state with a probability. To not get confused we show the probabilities with 'a' instead of 'p' so for any node of the graph the sum of the outgoing arrows should be 1, like the following:

Sunday, 17 January 2016

Introduction to Transition matrix and Superposition of states

Consider the following simple state transition network, it is just a simpler version of what we had in our previous post. In each state, we have only two options either staying in the state or going out to the other state. So it is obvious that if we consider the probability of going from S1 to S2 as a, the probability of staying in S1 would be 1-a. Same for when you are in S2, you either go to S1 with the probability of b or stay there with the probability of 1-b.


Simple 2 states transition network

At any given time like t, you are either in S1 or S2 so if we consider S(t) as the current state which is a vector of probabilities of being in S1 or S2  while we are in time t like bellow.

Thursday, 14 January 2016

Using probability to predict state of a system

Although we can discuss this topic from mathematical point of view and specifically Markov process model but it is not that much difficult to talk about it in plain english too. What I want to talk about it is that we can use what we've talked about during last 2 or 3 posts to build a network of information to find out the most probable state of a system or process.

Suppose you have a system which at any give time could be in one the following elements of the state set:

S = { S1, S2, S3, ... , Sn }

We start monitoring the system and record its state-change in every monitoring interval, after a while we will have something like the following:

Sample state transition network with transition score

Here it shows that we have seen 10 times transition from S1 to S1,  or 13 times transition from S1 to S2, ... or 25 times transition from S4 to S5 and ... We can easily convert these transition counts or scores to probability values. For example when your current state is S1 then you only have 3 options to go to next state, staying in S1 or going to S2 or going to S3. Each of these transitions has a probability which can be calculated as bellow:

Monday, 11 January 2016

The effect of repetition in learning

Our habits are strange, the more you do them the more it's difficult for you to get out of them. In a sense, it seems like some simple probability rules are controlling our habits, as you use them more the chance of doing them again gets increased and in order to get rid of them you have to either not repeat them or you have to do something else more.

Let's get back to our previous example, "route to work".  Look at the picture our subject usually uses route A or B to get to the office, every time he takes one of these routes we increase a counter like NA or NB which at any time shows the number of the times the subject has taken the route.

Learning model when you have 2 options to choose from

So the probability of taking any of these routes at any time will be like the followings:

Sunday, 10 January 2016

Visualizing relations and probabilities

It is not wrong if we say human vision is one the most advanced pattern recognition systems that exists. Just consider our vision system starts to work since we get born even before it! And starts collecting data, processing and ... consider your eyes get 10 samples per seconds from the environment while you are awake like 16 hours a day, here is how many images have been processed by a 30-year-old human being:

10 (image/sec) * 3600 (sec/hour) * 24 (hour/day) * 365 (day/year) * 30 (year) = 9,460,800,000

Now consider you have many shapes, objects, colors and ... in each of these captured images, how much we learn from them every second of our life!? Now you can understand why it is so much easy for a human being to recognize people, things, ... so much easy while it is still a very hard work for a machine.

Friday, 8 January 2016

All is about probability

Have you ever thought about how a goalkeeper dives for the ball or a dog runs to catch something?! Mathematical modeling and solving some complex time related differential equations? No way! It seems that it is not that much related to mathematics or physics or dynamics that we human know because most animals do many complex things in their daily life exactly like we human, hunting, using shortcuts while moving and so on.

A goalkeeper doesn't model the ball's movement
with complicated mathematical formulas, it just uses 
his already practiced patterns.
If you ask me, I'll say it is all about probability. Let's consider a simple example. We are going to model the route you take to work every morning. We monitor you and see for a month in which we have 24 working days you have taken route A for 20 days and route B for only 4 days.

It is a simple pattern and if someone knows he/she can easily find out the best route to see you there.  This is in fact exactly what a goalkeeper does or even a dog when they run for a ball or a frisbee. Even most basic daily patterns we or animals use are more complicated than our simple route selection. For now, we just know with the chance of 20/24 you'll choose route A and 4/24 route B. Now let's consider while we are monitoring you in those days we collect another data like if it is raining or not. Now the collecting data will give us a better prediction information.

Restarting

I haven't written even a word in months, you start writing about stuff you like with love and excitement and you know you have to sacrifices something for it. At least, you have to get time from something and give it to this. But after a while you find out it is the time to get time from writing the thing you started with love and give it to something else to be able to survive in life ... this is how things works in this world. Anyways, I'm gonna write about the human intelligence they way human thinks, stuff about pattern recognition or anomaly detection, perhaps some theories I mean Markov and or Bayesian models ...