Suppose we have a verified sample set like S which is, in fact, something like:

S = { X | X is a vector like (X1, X2, X3, ..., Xn) , X is not anomaly sample }

Now since these samples are all valid, we can define the R as the point in the middle of the all given X in S as below:

R = { (R1, R2, R3, ..., Rn) | Ri = (∑ Xij ) / n , where j loops over all elements of S elements }

Now we have the R, but we need to have the acceptable distance we talked about. If we calculate the distance for each of these samples (X € S) from the calculated R, we will have a set of numbers which shows the distance of each of these vectors from the reference. So if we have a function

*DF*over two points which calculated the distance between two points we have:

D = { d | d =

*DF*(X , R) , X € S }

In which D is the set of calculated distances. OK, we can have the standard deviation for this set so if we name it

*sd,*which is a standard deviation of the distance of each X from the reference point, then we can easily find the margin we are looking for.

What is it going to be? It can be one

*sd*or two

*sd*depending on how accurate we want to be. I think two

*sd*is good, remember that two

*sd*means almost 95% of the samples with distance around the reference we calculated.

Now if we call the average distance on D as

*μ,*any given sample with distance from R between

*d-sd*and

*d+sd*can be acceptable and distances out of this range will be anomalies or:

*μ-sd*≤

*DF*(X , R) ≤

*μ+sd*or

*μ-2.sd*≤

*DF*(X , R) ≤

*μ+2.sd*

## No comments:

## Post a comment