Thursday 20 November 2014

What is anomaly?

Merriam Webster's dictionary says that anomaly is something which is unusual or unexpected. The word "anomaly" itself comes from the Latin / Greek word "anomolia", meaning "uneven" or "irregular". So what is unusual or unexpected? Something we have learned its behavior and/or physical specifications through the time and now we see it / one of its instances in other way.

You usually go for a coffee around noon on working days, and the barista has been getting used to it. One day you come and ask for exact coffee you usually order, but he gets surprised. He also gets surprised even if you go there on working days at the same time but order something other than a usual thing. So it seems the nature of anomaly just exists if we either study the phenomena in periods of time or study a phenomena or instances of it in many places.



In fact, both "time" and "places" are same thing, if we consider the time as a physical dimension like x-y-z axes. What i mean is if you see 500 puppies all white and there exist two black (502 puppies) you immediately think these two have some issue and both are anomalies. You have learnt and get used to see the white puppies by looking at all of them. It is almost like showing you all these puppies one by one at a time.

Even when you see a scatter plot of a parameter for example in 2 dimensions and find out most of the dots are in specific area but a few of them are not (which we can consider them as anomalies) is nothing more than what we talked about.

Anomaly vs Noise
When you are gathering or collecting data, noise is something unwanted, if you are gathering puppies to study then suddenly you find a cat between them, it is not anomaly it is data gathering noise. But if all of the puppies are in white except one, then this one is an anomaly.

There is also a term in data-mining, "Outlier" which more or less means anomaly. Outlier is statistics term for those samples which are distant from others. You can always define a point and a margin, then calculate the distant of your observed phenomena and say if they are in range or not. If not they are outlier and you can call them anomalies.

Most of the time calculating the distance is not difficult at all, the difficulty is finding a reference point to calculate the distance from. Note that the reference can be anything, a waveform, physical shape of a puppy or ... in any of these cases we can interpret the distance as a single point in multidimensional space.

No comments:

Post a Comment