## Monday, 29 February 2016

### Introduction to reducing uncertainty of outcomes and predictive modeling

There are some ways you can predict the future or next state of a system. In most of them, you build a statistical model from the information you have and the try to predict the next outcome of the system, although you can go forward and make another prediction; but as you move the uncertainty of the outcome increases. The simplest we can say is that what will be the chance of getting tail when we toss a coin, considering the coin is fair? We all know the answer is 50%.

We assume the system we are trying to predict its next state is not 100% uncertain. Look the bellow series. The first sounds like tossing a fair coin while the second doesn't.

A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,?      

A,A,A,A,A,B,B,A,A,A,B,A,A,A,A,A,B,A,A,A,?     

If someone asks you what will be the next character in series ? You easily reply A. Why? Because it shows that the number of As and Bs are equal and after each A you have a B and after each B you have an A. In fact, from the given data in , we see no randomness or uncertainty in the sequence, like the following: A model for series  In any state you are, the next state will be the other one.

But what about the series ? It is not that much easy. You need to build a model to answer this question.

Model #1
The easiest way to model series  is counting the number of As and Bs and calculate the probabilities their outcomes as they are in a bag without considering the existing order, so:

Count(A) = 16, Count(B) = 4
P(A) = 16/20 = 0.8
P(B) = 4/20 = 0.2

So it is 80% likely to have A as the outcome and 20% letter B, you can continue using these probabilities and predict the result. It is exactly like a back and forth random walk with chances of 0.8 to 0.2 or 4 to 1. A model for series  In any state you are, with chance of 0.8 or 0.2 you will be in a or b.

Model #2
Considering the model #1, we want to use the current letter to improve our model and reduce the uncertainty of the next outcome. Form the series  we have the following:

P(A,A) = 12/19 = 0.63
P(A,B) = 3/19 = 0.16
P(B,A) = 3/19 = 0.16
P(B,B) = 1/19 = 0.05

Since we have two experiments on the given data, we can use Bayes Theorem to get a better prediction, but before doing any further calculation let me remind the Bayes Theorem:

P(H|E) = [ P(H) . P(E|H) ] / [ P(H) . P(E|H) + P(H') . P(E|H') ]     

I usually prefer to use H and E to write the formula, it gives more sense, I mean H for hypothesis and E for Evidence. Understanding the Bayes Theorem is critical and a bit difficult, especially when you want to model the problem with the formula, so try to understand the followings which are description of the terms in .
• H is the initial hypothesis of having A as our next outcome so P(H) = 0.8
• E  is the new evidence we have that if the previous letter is A then the next letter is A with chance of P(E|H) = 0.63
• P(H')  is the probability of having no A so it is 1-P(H) or 0.2
• P(E|H') is the chance of not having A while the previous letter is A, which equals to 0.16

P(H|E) = [ 0.8 x 0.63 ] / [ 0.8 x 0.63 + 0.2 x 0.16 ] = 0.94       

So you see now we have a better prediction of the new state. Why? Because we have involved a new evidence, the current letter of the series.

Model #3
You can use the same idea in model #2 and reduce the uncertainty by looking at the letter before the current letter or use the two recent letter, so we have:

P(A,X,A) = 8/16 = 0.5
P(A,X,B) = 4/16 = 0.25
P(B,X,A) = 4/16 = 0.25
P(B,X,B) = 0/16 = 0.0

In the above probabilities X means don't car, so P(A,X,A) means have sequence of A,A,A or A,B,A and etc. Now since the last two letters are A, using the  we have:

P(H|E) = [ 0.94 x 0.5 ] / [ 0.94 x 0.5 + 0.06 x 0.25 ] = 0.97       

So we can say that:

The more evidence of the current and the past, the better prediction of the future.

Does this have any relation to Markov Chain?
This modeling is more based on Bayesian Theorem than Markov Chain. However, if you name the current letter as L0, and the letter before it L-1, then the next letter to predict, can be shown as L1. Now since each of these Ls can be A or B then we can build a 6x6 transition matrix with proper probabilities representing a Markov Chain. Note that by definition Markov Chain is memoryless and just looks at the current state, but we always can assume memories from the past as some current state parameters. You might be interested to read about transition matrix in one of the old posts.