Anomaly Detection Algorithms
Most of the anomaly detection algorithms below work on a distance/similarity matrix D or a kernel/dissimilarity matrix K. They can be comuted using the functions provided here.
Currently supported algorithms include
- Recurrences (REC)
- Kernel Density Estimation (KDE)
- Hotelling's $T^2$ (Mahalanobis distance) (T2)
- two k-Nearest Neighbor approaches (KNN-Gamma, KNN-Delta)
- Univariate Approach (UNIV)
- Support Vector Data Description (SVDD)
- Kernel Null Foley Summon Transform (KNFST)
Functions
Recurrences
MultivariateAnomalies.REC — FunctionREC(D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)Count the number of observations (recurrences) which fall into a radius rec_threshold of a distance matrix D. Exclude steps which are closer than temp_excl to be count as recurrences (default: temp_excl = 5)
Marwan, N., Carmen Romano, M., Thiel, M., & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5-6), 237–329. http://doi.org/10.1016/j.physrep.2006.11.001
MultivariateAnomalies.REC! — FunctionREC!(rec_out::AbstractArray, D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)Memory efficient version of REC() for use within a loop. rec_out is preallocated output, should be initialised with init_REC().
MultivariateAnomalies.init_REC — Functioninit_REC(D::Array{Float64, 2})
init_REC(T::Int)get object for memory efficient REC!() versions. Input can be a distance matrix D or the number of timesteps (observations) T.
Kernel Density Estimation
MultivariateAnomalies.KDE — FunctionKDE(K)Compute a Kernel Density Estimation (the Parzen sum), given a Kernel matrix K.
Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33, 1–1065–1076.
MultivariateAnomalies.KDE! — FunctionKDE!(KDE_out, K)Memory efficient version of KDE(). Additionally uses preallocated KDE_out object for writing the results. Initialize KDE_out with init_KDE().
MultivariateAnomalies.init_KDE — Functioninit_KDE(K::Array{Float64, 2})
init_KDE(T::Int)Returns KDE_out object for usage in KDE!(). Use either a Kernel matrix K or the number of time steps/observations T as argument.
Hotelling's T<sup>2</sup>
MultivariateAnomalies.T2 — FunctionT2{tp}(data::AbstractArray{tp,2}, Q::AbstractArray[, mv])Compute Hotelling's T2 control chart (the squared Mahalanobis distance to the data's mean vector (mv), given the covariance matrix Q). Input data is a two dimensional data matrix (observations * variables).
Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.
MultivariateAnomalies.T2! — FunctionT2!(t2_out, data, Q[, mv])Memory efficient version of T2(), for usage within a loop etc. Initialize the t2_out object with init_T2(). t2_out[1] contains the squred Mahalanobis distance after computation.
MultivariateAnomalies.init_T2 — Functioninit_T2(VAR::Int, T::Int)
init_T2{tp}(data::AbstractArray{tp,2})initialize t2_out object for T2! either with number of variables VAR and observations/time steps T or with a two dimensional data matrix (time * variables)
k-Nearest Neighbors
MultivariateAnomalies.KNN_Gamma — FunctionKNN_Gamma(knn_dists_out)This function computes the mean distance of the K nearest neighbors given a knn_dists_out object from knn_dists() as input argument.
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015
MultivariateAnomalies.KNN_Gamma! — FunctionKNN_Gamma!(KNN_Gamma_out, knn_dists_out)Memory efficient version of KNN_Gamma, to be used in a loop. Initialize KNN_Gamma_out with init_KNN_Gamma().
MultivariateAnomalies.init_KNN_Gamma — Functioninit_KNN_Gamma(T::Int)
init_KNN_Gamma(knn_dists_out)initialize a KNN_Gamma_out object for KNN_Gamma! either with T, the number of observations/time steps or with a knn_dists_out object.
MultivariateAnomalies.KNN_Delta — FunctionKNN_Delta(knn_dists_out, data)Compute Delta as vector difference of the k-nearest neighbors. Arguments are a knn_dists() object (knn_dists_out) and a data matrix (observations * variables)
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015
MultivariateAnomalies.KNN_Delta! — FunctionKNN_Delta!(KNN_Delta_out, knn_dists_out, data)Memory Efficient Version of KNN_Delta(). KNN_Delta_out[1] is the vector difference of the k-nearest neighbors.
MultivariateAnomalies.init_KNN_Delta — Functioninit_KNN_Delta(T, VAR, k)return a KNN_Delta_out object to be used for KNN_Delta!. Input: time steps/observations T, variables VAR, number of K nearest neighbors k.
Univariate Approach
MultivariateAnomalies.UNIV — FunctionUNIV(data)order the values in each varaible and return their maximum, i.e. any of the variables in data (observations * variables) is above a given quantile, the highest quantile will be returned.
MultivariateAnomalies.UNIV! — FunctionUNIV!(univ_out, data)Memory efficient version of UNIV(), input an univ_out object from init_UNIV() and some data matrix observations * variables
MultivariateAnomalies.init_UNIV — Functioninit_UNIV(T::Int, VAR::Int)
init_UNIV{tp}(data::AbstractArray{tp, 2})initialize a univ_out object to be used in UNIV!() either with number of time steps/observations T and variables VAR or with a data matrix observations * variables.
Support Vector Data Description
MultivariateAnomalies.SVDD_train — FunctionSVDD_train(K, nu)train a one class support vecort machine model (i.e. support vector data description), given a kernel matrix K and and the highest possible percentage of outliers nu. Returns the model object (svdd_model). Requires LIBSVM.
Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.
MultivariateAnomalies.SVDD_predict — FunctionSVDD_predict(svdd_model, K)predict the outlierness of an object given the testing Kernel matrix K and the svdd_model from SVDD_train(). Requires LIBSVM.
Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.
Kernel Null Foley Summon Transform
MultivariateAnomalies.KNFST_train — FunctionKNFST_train(K)train a one class novelty KNFST model on a Kernel matrix K according to Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
Output
(proj, targetValue) proj – projection vector for data points (project x via kx*proj, where kx is row vector containing kernel values of x and training data) targetValue – value of all training samples in the null space
MultivariateAnomalies.KNFST_predict — FunctionKNFST_predict(model, K)predict the outlierness of some data (represented by the kernel matrix K), given some KNFST model from KNFST_train(K). Compute Kwith kernel_matrix().
Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
MultivariateAnomalies.KNFST_predict! — FunctionKNFST_predict!(KNFST_out, KNFST_mod, K)predict the outlierness of some data (represented by the kernel matrix K), given a KNFST_out object (init_KNFST()), some KNFST model (KNFST_mod = KNFST_train(K)) and the testing kernel matrix K.
Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
MultivariateAnomalies.init_KNFST — Functioninit_KNFST(T, KNFST_mod)initialize a KNFST_outobject for the use with KNFST_predict!, given T, the number of observations and the model output KNFST_train(K).
Distance to some Centers
MultivariateAnomalies.Dist2Centers — FunctionDist2Centers(centers::AbstractArray{tp, 2}) where {tp}Compute the distance to the nearest centers of i.e. a K-means clustering output. Large Distances to the nearest center are anomalies. data: Observations * Variables.
Example
(proj, targetValue)