sleptons: February 2016

Monday, 29 February 2016

Introduction to reducing uncertainty of outcomes and predictive modeling

There are some ways you can predict the future or next state of a system. In most of them, you build a statistical model from the information you have and the try to predict the next outcome of the system, although you can go forward and make another prediction; but as you move the uncertainty of the outcome increases. The simplest we can say is that what will be the chance of getting tail when we toss a coin, considering the coin is fair? We all know the answer is 50%.

We assume the system we are trying to predict its next state is not 100% uncertain. Look the bellow series. The first sounds like tossing a fair coin while the second doesn't.

A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,A,B,? [1]

A,A,A,A,A,B,B,A,A,A,B,A,A,A,A,A,B,A,A,A,? [2]

If someone asks you what will be the next character in series [1]? You easily reply A. Why? Because it shows that the number of As and Bs are equal and after each A you have a B and after each B you have an A. In fact, from the given data in [1], we see no randomness or uncertainty in the sequence, like the following:

A model for series [1]
In any state you are, the next state will be the other one.

But what about the series [2]? It is not that much easy. You need to build a model to answer this question.

Dealing with extremely small probabilities in Bayesian Modeling

If you have ever tried to model a complex system with Bayesian Network or have an attempted to calculate the joint probability of many outcomes you know what I'm saying! Multiplication of many small numbers which eventually gives you nothing but zero!? To have a better idea of what the problem is, let us consider a classic text classification problem.

The source code of this post is located here:
Download the document classification Java code, using Bayes Theorem

Document Classification

Suppose you have some already classified documents and want to build a classification program, here is what your training dataset should look like:

(d11,c1), (d12,c1), (d12,c1), ...
(d21,c2), (d22,c2), (d22,c2), ...
.
.
.
(dn1,cn), (dn2,cn), (dn3,cn), ...

We give the labeled documents as training datasets to our program, it learns them and then it should determine the class of any given document.

The simple Mathematics behind object catching

I always ask myself, do we do some complex Mathematics in our brain when we want to recognize an object or make a decision? Or do we solve tens of time-related multidimensional partial differential equations to cache a thrown object?

My answer, we don't. Our brain is a network of hundreds of billions of neurons, how could this system work based on some brain-made models and abstraction like Mathematics, which we the creator of these formulas still have problems solving them? Have you ever tried to model the "Object Catching" completely?

What I'm attempting to say is that our brain uses very simple mathematics to do things, like recognizing objects, making a decision or catching a thrown object. I don't say these are simple tasks our brain do; I just say the Mathematics and model behind all these tasks are simple.

In fact, what makes our brain does these incredible works is the simplicity of the way it works, not the complexity of it.

A sophisticated tool or solution works well for particular situations or problems while a simple one could work for different kinds of problems or situations. (Think about a knife; you can do many things with a knife while with something like a Phillips or Torx screwdriver you can't. You even can turn a Phillips or Torx screw with a knife.)

All we do to catch a thrown object is repeatedly doing some linear position approximation
and moving or adjusting the body or the hands based on already learned patterns.

Consider the "Object caching" process which is one of my favorites processes to think about; whenever I see a dog catches a thrown object, I give one more vote to the idea that there should be a simple Mathematics behind this process, which even a plant does it, they grow the light!

Canadian dollar, 3 months forecast

I used daily values of the following information from "Bank of Canada" and "Federal Reserve Bank of St. Louis." for ten years to build a model to predict the behavior of the Canadian dollar in next three months.

- US dollar rate
- Euro rate
- Yuan rate
- Oil price

Although there are some reasons I chose them, in these kinds of economic analysis especially in Machine Learning methods the more data you have, the better prediction you can have, and the above data were the only easy digestible I found. Before anything else let me say that:

The calculation shows it is more likely to get better than worse.

The CAD rate prediction for next 100 days

OK, I build a full mesh Bayesian Network for all dimensions. The model's time resolution is the month, and for the rate changes I used only three choices "up", "down" and "no change".

Two-Dimensional Bayesian Network Classifier

I posted a simple time series pattern recognition ten days ago, and I didn't expect readers may come, download and test it. I'm happy it happened and since some of them have asked to prepare something to show the result, here I put a server and client application to let you send your training and test data to the server and it displays the result.

The server side name is BNTSC1 it is for "Bayesian Network Time Series Classifier" but since it can be used for any two-dimensional data, I preferred to put that as the title of the post. You can download the server and client side from the following link:

Download server and client applications from here ...

And the online version of the tool is here:

Online Bayesian Network Time Series Classifier

Server side
The server side is a simple Java servlet application, containing two individual servlets. the "Home" class is a servlet which reads the sent data and displays them on the screen and the "Data" servlet which you can config the classifier and send training or test data to it.

The role of patterns in human habit

1- Introduction

Once in a while, we decide to get used to a new habit or forget a one, like doing morning exercise, not drinking a sugary beverage, studying every night, learning to play or sing a piece of music or song, etc. You can look at these habits as some patterns or sequences of doing specific tasks. Even when you get used to a habit, whenever you repeat it, you may change some parts of it or apply some improvement to it and after a while at some point, don't need to think consciously about what you are doing.

Patterns also do exist all around us in the physical world. Try to imagine the world at a small atomic scale, at this magnitude all you have is a collection of billions of billions of billions ... of electrons, neutrons and protons.

A tool for time series pattern recognition

I modified previously provided tools to build a new one which gets both training and test datasets of a time series to see how much the given test dataset satisfies the learnt pattern (or simply validates the given test set). If you need to get more information on Bayesian Network, the following link shows almost all I've written on this topic.

Bayesian Network posts

The tool is at the following address in the blog tools section:

Simple time series pattern recognition

You may like to test it with the default provided data first, but we are going to describe how it works and how you can use it. So forget about the default data in the text areas, copy and paste the following data in first text area training and let the second one be empty:

Training dataset 1
100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,74,82,

The training data is a single-period sinusoidal waveform. If you draw it with the tool, you'll see the following graph.

Single period of the training data

Since real time series are usually not the same when they appear in their several instances or periods, add the following two training datasets to the training text area and draw the graph. You need to put a new line for each training period of data.

Training dataset 2

83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,

Training dataset 3

66,74,83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,

The result should be like the following, here the training datasets just have a phase shift, but you can use any real training set you have. You can also see the live result by clicking here.

Three periods of training data

Now it is the time to ask our tool to see how much a test data could be valid according to the trained datasets, so copy and paste the following test data into the second text area and draw the result.

Test dataset 1

74,83,91,160,109,140,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,100,148,141,134,126,100,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,

As you see the tool draws the test data with gray color, and if the given point satisfies the trained pattern, the tool validates and shows it with green color and if not the tool shows it in red color. You can also put your mouse over the test data points and see the validation probability respect to the trained pattern at any point. Click here or on the image to see a live version if you didn't copy and paste the data.

Three training dataset and one test dataset

How does it work?
We have talked about it before; the short story is that the tool builds a Bayesian Network from the given training datasets. To convert datasets to nodes and edges in the network, it lowers the resolution of the data for both time and value of the series and connects time nodes to corresponding value nodes, increases if the connection already exists, then it calculates probabilities. Here is the network the tool generates for the above test.

Bayesian Network for the above example

How can we say the test dataset matches the trained network?
It is all up to you, the tool gives you the probabilities, you can assume having one zero probability means the test data doesn't match the trained pattern. You can have some tolerance and say having 90% none zero probability is OK. Or even you may say I only accept the points with a probability of higher than 0.25 so if you have one lower than 0.25 you should reject or invalidate the test data.

Tuesday, 9 February 2016

Expected Value and Learning Strategy

Suppose you need to train a machine with (or learn from) an online stream of data and the given data is not tagged, so you don't have any idea of if the recent data complies your classification or not. To continue talking about this problem let's consider a simple scenario in which we monitor only one variable of a system. For example, the temperature of a room, traffic volume of a network, stock market return rate, etc.

To build a simple model for our problem, we start at time zero with the value of V(0), the next sample at time one will be V(1), etc. Like the following, don't forget the data is not labeled so we only have the current value:

V(0), V(1), V(2), ... V(n)

Studying and learning a phenomenon is about gathering information and building a model to have a better expectation of its future. Even when you keep a hard-copy history on something you just want to be able to study it whenever you like to improve your model and expectation accuracy. And in our simple single variable example, it is about finding a value which shows the most expected value of the variable with one condition, you are only allowed to keep or store the expected value, nothing else.

Simple Time Series Pattern Recognition Source Code

If you've read the posts about Bayesian Network, you are now ready to write your first pattern recognition system based on a Bayesian Network. Since I'm working with Java in these days, I prepared a simple Java version, let's see how it works. You can find the source code in the following page:

Simple Time Series Pattern Recognition Source Code

Terminology and setters
The main class which does the job is "SimpleTimeSeries". The terminology is that we have some raw time and value which are those you need the system learns their pattern. We also have model time and value; these are the parameters our Bayesian Network uses them as the nodes.

There are also some setter methods which set the crucial parameters of the system like maximum raw time (maxRawTime) which is the largest possible time value. For example, if your single period of data has 720 sample, you need to set it to 719 because we assume raw time for a single period starts from 0 and ends to 719. There is another setter for time series raw value (maxRawValue), if your series value goes up to 5000, set it to 5000.

Canadian dollar exchange rate analysis (2007-2015)

This post is about how you can have a scientific approach to thinking or analyzing different phenomena using Bayesian Network. The only tools I used to prepare information were OpenOffice's Spreadsheet and the tools I have provided for this blog. I was looking for some data source on the web and finally found the Canadian to US dollars exchange rate in last nine years in Bank of Canada website. If you just draw the rate for 2007 to 2015, you'll have something like the following graph.

Canadian to US dollar exchange rate (data source: Bank of Canada)

Did you note the rates between September and December of 2007?!

A tool to build Bayesian Network from a time series

I showed in the last post (How to convert "Time Series" to "Bayesian Network") how we can convert a sample time series to a Bayesian Network as you may have seen it is easy, but there are some things we have to talk about them a bit more.

1- You do not need to be worried about the maximum value of the Y axis. Whatever it is 10, 1000, 100K, 10G, ... your resolution of processing depends on the overall form of the series you are looking or processing, not these measures or units. What I mean is that a series that shows a network traffic with a maximum of 10G in one day can be considered a series showing the body temperature of a human! We don't care what it is about or in what unit is its Y or X (time) axes.

2- Like the Y axis, the length of X axis or resolution of time-period doesn't matter too, an hour, a day, a month, whatever it is, we only deal with some low-resolution measurement for this axis. For example, in our daily network traffic, we may choose a resolution of an hour to build our time or X nodes. So we only have 24 nodes of type time, each representing a specific hour in a day.

sleptons

Monday, 29 February 2016

Introduction to reducing uncertainty of outcomes and predictive modeling

Tuesday, 23 February 2016

Dealing with extremely small probabilities in Bayesian Modeling

Monday, 22 February 2016

The simple Mathematics behind object catching

Thursday, 18 February 2016

Canadian dollar, 3 months forecast

Wednesday, 17 February 2016

Two-Dimensional Bayesian Network Classifier

Monday, 15 February 2016

The role of patterns in human habit

Friday, 12 February 2016

A tool for time series pattern recognition

Tuesday, 9 February 2016

Expected Value and Learning Strategy

Sunday, 7 February 2016

Simple Time Series Pattern Recognition Source Code

Thursday, 4 February 2016

Canadian dollar exchange rate analysis (2007-2015)

Monday, 1 February 2016

A tool to build Bayesian Network from a time series