Anomaly Detection Algorithms · MultivariateAnomalies.jl

Anomaly Detection Algorithms

Most of the anomaly detection algorithms below work on a distance/similarity matrix D or a kernel/dissimilarity matrix K. They can be comuted using the functions provided here.

Currently supported algorithms include

Recurrences (REC)
Kernel Density Estimation (KDE)
Hotelling's $T^2$ (Mahalanobis distance) (T2)
two k-Nearest Neighbor approaches (KNN-Gamma, KNN-Delta)
Univariate Approach (UNIV)
Support Vector Data Description (SVDD)
Kernel Null Foley Summon Transform (KNFST)

Functions

Recurrences

MultivariateAnomalies.REC — Function

REC(D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)

Count the number of observations (recurrences) which fall into a radius rec_threshold of a distance matrix D. Exclude steps which are closer than temp_excl to be count as recurrences (default: temp_excl = 5)

Marwan, N., Carmen Romano, M., Thiel, M., & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5-6), 237–329. http://doi.org/10.1016/j.physrep.2006.11.001

MultivariateAnomalies.REC! — Function

REC!(rec_out::AbstractArray, D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)

Memory efficient version of REC() for use within a loop. rec_out is preallocated output, should be initialised with init_REC().

MultivariateAnomalies.init_REC — Function

init_REC(D::Array{Float64, 2})
init_REC(T::Int)

get object for memory efficient REC!() versions. Input can be a distance matrix D or the number of timesteps (observations) T.

Kernel Density Estimation

MultivariateAnomalies.KDE — Function

KDE(K)

Compute a Kernel Density Estimation (the Parzen sum), given a Kernel matrix K.

Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33, 1–1065–1076.

MultivariateAnomalies.KDE! — Function

KDE!(KDE_out, K)

Memory efficient version of KDE(). Additionally uses preallocated KDE_out object for writing the results. Initialize KDE_out with init_KDE().

MultivariateAnomalies.init_KDE — Function

init_KDE(K::Array{Float64, 2})
init_KDE(T::Int)

Returns KDE_out object for usage in KDE!(). Use either a Kernel matrix K or the number of time steps/observations T as argument.

Hotelling's T<sup>2</sup>

MultivariateAnomalies.T2 — Function

T2{tp}(data::AbstractArray{tp,2}, Q::AbstractArray[, mv])

Compute Hotelling's T2 control chart (the squared Mahalanobis distance to the data's mean vector (mv), given the covariance matrix Q). Input data is a two dimensional data matrix (observations * variables).

Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.

MultivariateAnomalies.T2! — Function

T2!(t2_out, data, Q[, mv])

Memory efficient version of T2(), for usage within a loop etc. Initialize the t2_out object with init_T2(). t2_out[1] contains the squred Mahalanobis distance after computation.

MultivariateAnomalies.init_T2 — Function

init_T2(VAR::Int, T::Int)
init_T2{tp}(data::AbstractArray{tp,2})

initialize t2_out object for T2! either with number of variables VAR and observations/time steps T or with a two dimensional data matrix (time * variables)

k-Nearest Neighbors

MultivariateAnomalies.KNN_Gamma — Function

KNN_Gamma(knn_dists_out)

This function computes the mean distance of the K nearest neighbors given a knn_dists_out object from knn_dists() as input argument.

Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015

MultivariateAnomalies.KNN_Gamma! — Function

KNN_Gamma!(KNN_Gamma_out, knn_dists_out)

Memory efficient version of KNN_Gamma, to be used in a loop. Initialize KNN_Gamma_out with init_KNN_Gamma().

MultivariateAnomalies.init_KNN_Gamma — Function

init_KNN_Gamma(T::Int)
init_KNN_Gamma(knn_dists_out)

initialize a KNN_Gamma_out object for KNN_Gamma! either with T, the number of observations/time steps or with a knn_dists_out object.

MultivariateAnomalies.KNN_Delta — Function

KNN_Delta(knn_dists_out, data)

Compute Delta as vector difference of the k-nearest neighbors. Arguments are a knn_dists() object (knn_dists_out) and a data matrix (observations * variables)

Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015

MultivariateAnomalies.KNN_Delta! — Function

KNN_Delta!(KNN_Delta_out, knn_dists_out, data)

Memory Efficient Version of KNN_Delta(). KNN_Delta_out[1] is the vector difference of the k-nearest neighbors.

MultivariateAnomalies.init_KNN_Delta — Function

init_KNN_Delta(T, VAR, k)

return a KNN_Delta_out object to be used for KNN_Delta!. Input: time steps/observations T, variables VAR, number of K nearest neighbors k.

Univariate Approach

MultivariateAnomalies.UNIV — Function

UNIV(data)

order the values in each varaible and return their maximum, i.e. any of the variables in data (observations * variables) is above a given quantile, the highest quantile will be returned.

MultivariateAnomalies.UNIV! — Function

UNIV!(univ_out, data)

Memory efficient version of UNIV(), input an univ_out object from init_UNIV() and some data matrix observations * variables

MultivariateAnomalies.init_UNIV — Function

init_UNIV(T::Int, VAR::Int)
init_UNIV{tp}(data::AbstractArray{tp, 2})

initialize a univ_out object to be used in UNIV!() either with number of time steps/observations T and variables VAR or with a data matrix observations * variables.

Support Vector Data Description

MultivariateAnomalies.SVDD_train — Function

SVDD_train(K, nu)

train a one class support vecort machine model (i.e. support vector data description), given a kernel matrix K and and the highest possible percentage of outliers nu. Returns the model object (svdd_model). Requires LIBSVM.

Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.

MultivariateAnomalies.SVDD_predict — Function

SVDD_predict(svdd_model, K)

predict the outlierness of an object given the testing Kernel matrix K and the svdd_model from SVDD_train(). Requires LIBSVM.

Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.

Kernel Null Foley Summon Transform

MultivariateAnomalies.KNFST_train — Function

KNFST_train(K)

train a one class novelty KNFST model on a Kernel matrix K according to Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

Output

(proj, targetValue) proj – projection vector for data points (project x via kx*proj, where kx is row vector containing kernel values of x and training data) targetValue – value of all training samples in the null space

MultivariateAnomalies.KNFST_predict — Function

KNFST_predict(model, K)

predict the outlierness of some data (represented by the kernel matrix K), given some KNFST model from KNFST_train(K). Compute Kwith kernel_matrix().

Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

MultivariateAnomalies.KNFST_predict! — Function

KNFST_predict!(KNFST_out, KNFST_mod, K)

predict the outlierness of some data (represented by the kernel matrix K), given a KNFST_out object (init_KNFST()), some KNFST model (KNFST_mod = KNFST_train(K)) and the testing kernel matrix K.

Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

MultivariateAnomalies.init_KNFST — Function

init_KNFST(T, KNFST_mod)

initialize a KNFST_outobject for the use with KNFST_predict!, given T, the number of observations and the model output KNFST_train(K).

Distance to some Centers

MultivariateAnomalies.Dist2Centers — Function

Dist2Centers(centers::AbstractArray{tp, 2}) where {tp}

Compute the distance to the nearest centers of i.e. a K-means clustering output. Large Distances to the nearest center are anomalies. data: Observations * Variables.

Example

(proj, targetValue)

Index