Friday 12 February 2016

A tool for time series pattern recognition

I modified previously provided tools to build a new one which gets both training and test datasets of a time series to see how much the given test dataset satisfies the learnt pattern (or simply validates the given test set). If you need to get more information on Bayesian Network, the following link shows almost all I've written on this topic.

Bayesian Network posts

The tool is at the following address in the blog tools section:

Simple time series pattern recognition

You may like to test it with the default provided data first, but we are going to describe how it works and how you can use it. So forget about the default data in the text areas, copy and paste the following data in first text area training and let the second one be empty:

Training dataset 1
100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,74,82,

The training data is a single-period sinusoidal waveform. If you draw it with the tool, you'll see the following graph.

 Single period of the training data

Since real time series are usually not the same when they appear in their several instances or periods, add the following two training datasets to the training text area and draw the graph. You need to put a new line for each training period of data.

Training dataset 2
83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,66,

Training dataset 3
66,74,83,91,100,109,117,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,154,148,141,134,126,118,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,

The result should be like the following, here the training datasets just have a phase shift, but you can use any real training set you have. You can also see the live result by clicking here.

 Three periods of training data

Now it is the time to ask our tool to see how much a test data could be valid according to the trained datasets, so copy and paste the following test data into the second text area and draw the result.

Test dataset 1
74,83,91,160,109,140,126,134,141,148,154,159,163,167,169,170,170,169,167,163,159,100,148,141,134,126,100,109,100,91,83,74,66,59,52,46,41,37,33,31,30,30,31,33,37,41,46,52,59,

As you see the tool draws the test data with gray color, and if the given point satisfies the trained pattern, the tool validates and shows it with green color and if not the tool shows it in red color. You can also put your mouse over the test data points and see the validation probability respect to the trained pattern at any point. Click here or on the image to see a live version if you didn't copy and paste the data.

 Three training dataset and one test dataset

How does it work?
We have talked about it before; the short story is that the tool builds a Bayesian Network from the given training datasets. To convert datasets to nodes and edges in the network, it lowers the resolution of the data for both time and value of the series and connects time nodes to corresponding value nodes, increases if the connection already exists, then it calculates probabilities. Here is the network the tool generates for the above test.

 Bayesian Network for the above example

How can we say the test dataset matches the trained network?
It is all up to you, the tool gives you the probabilities, you can assume having one zero probability means the test data doesn't match the trained pattern. You can have some tolerance and say having 90% none zero probability is OK. Or even you may say I only accept the points with a probability of higher than 0.25 so if you have one lower than 0.25 you should reject or invalidate the test data.