Anomaly Detection Algorithms
Most of the anomaly detection algorithms below work on a distance/similarity matrix D
or a kernel/dissimilarity matrix K
. They can be comuted using the functions provided here.
Currently supported algorithms include
- Recurrences (REC)
- Kernel Density Estimation (KDE)
- Hotelling's $T^2$ (Mahalanobis distance) (T2)
- two k-Nearest Neighbor approaches (KNN-Gamma, KNN-Delta)
- Univariate Approach (UNIV)
- Support Vector Data Description (SVDD)
- Kernel Null Foley Summon Transform (KNFST)
Functions
Recurrences
MultivariateAnomalies.REC
— FunctionREC(D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)
Count the number of observations (recurrences) which fall into a radius rec_threshold
of a distance matrix D
. Exclude steps which are closer than temp_excl
to be count as recurrences (default: temp_excl = 5
)
Marwan, N., Carmen Romano, M., Thiel, M., & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5-6), 237–329. http://doi.org/10.1016/j.physrep.2006.11.001
MultivariateAnomalies.REC!
— FunctionREC!(rec_out::AbstractArray, D::AbstractArray, rec_threshold::Float64, temp_excl::Int = 0)
Memory efficient version of REC()
for use within a loop. rec_out
is preallocated output, should be initialised with init_REC()
.
MultivariateAnomalies.init_REC
— Functioninit_REC(D::Array{Float64, 2})
init_REC(T::Int)
get object for memory efficient REC!()
versions. Input can be a distance matrix D
or the number of timesteps (observations) T
.
Kernel Density Estimation
MultivariateAnomalies.KDE
— FunctionKDE(K)
Compute a Kernel Density Estimation (the Parzen sum), given a Kernel matrix K
.
Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33, 1–1065–1076.
MultivariateAnomalies.KDE!
— FunctionKDE!(KDE_out, K)
Memory efficient version of KDE()
. Additionally uses preallocated KDE_out
object for writing the results. Initialize KDE_out
with init_KDE()
.
MultivariateAnomalies.init_KDE
— Functioninit_KDE(K::Array{Float64, 2})
init_KDE(T::Int)
Returns KDE_out
object for usage in KDE!()
. Use either a Kernel matrix K
or the number of time steps/observations T
as argument.
Hotelling's T<sup>2</sup>
MultivariateAnomalies.T2
— FunctionT2{tp}(data::AbstractArray{tp,2}, Q::AbstractArray[, mv])
Compute Hotelling's T2 control chart (the squared Mahalanobis distance to the data's mean vector (mv
), given the covariance matrix Q
). Input data is a two dimensional data matrix (observations * variables).
Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.
MultivariateAnomalies.T2!
— FunctionT2!(t2_out, data, Q[, mv])
Memory efficient version of T2()
, for usage within a loop etc. Initialize the t2_out
object with init_T2()
. t2_out[1]
contains the squred Mahalanobis distance after computation.
MultivariateAnomalies.init_T2
— Functioninit_T2(VAR::Int, T::Int)
init_T2{tp}(data::AbstractArray{tp,2})
initialize t2_out
object for T2!
either with number of variables VAR
and observations/time steps T
or with a two dimensional data
matrix (time * variables)
k-Nearest Neighbors
MultivariateAnomalies.KNN_Gamma
— FunctionKNN_Gamma(knn_dists_out)
This function computes the mean distance of the K nearest neighbors given a knn_dists_out
object from knn_dists()
as input argument.
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015
MultivariateAnomalies.KNN_Gamma!
— FunctionKNN_Gamma!(KNN_Gamma_out, knn_dists_out)
Memory efficient version of KNN_Gamma
, to be used in a loop. Initialize KNN_Gamma_out
with init_KNN_Gamma()
.
MultivariateAnomalies.init_KNN_Gamma
— Functioninit_KNN_Gamma(T::Int)
init_KNN_Gamma(knn_dists_out)
initialize a KNN_Gamma_out
object for KNN_Gamma!
either with T
, the number of observations/time steps or with a knn_dists_out
object.
MultivariateAnomalies.KNN_Delta
— FunctionKNN_Delta(knn_dists_out, data)
Compute Delta as vector difference of the k-nearest neighbors. Arguments are a knn_dists()
object (knn_dists_out
) and a data
matrix (observations * variables)
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., & Müller, K.-R. (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13-15), 1608–1618. http://doi.org/10.1016/j.neucom.2005.05.015
MultivariateAnomalies.KNN_Delta!
— FunctionKNN_Delta!(KNN_Delta_out, knn_dists_out, data)
Memory Efficient Version of KNN_Delta()
. KNN_Delta_out[1]
is the vector difference of the k-nearest neighbors.
MultivariateAnomalies.init_KNN_Delta
— Functioninit_KNN_Delta(T, VAR, k)
return a KNN_Delta_out
object to be used for KNN_Delta!
. Input: time steps/observations T
, variables VAR
, number of K nearest neighbors k
.
Univariate Approach
MultivariateAnomalies.UNIV
— FunctionUNIV(data)
order the values in each varaible and return their maximum, i.e. any of the variables in data
(observations * variables) is above a given quantile, the highest quantile will be returned.
MultivariateAnomalies.UNIV!
— FunctionUNIV!(univ_out, data)
Memory efficient version of UNIV()
, input an univ_out
object from init_UNIV()
and some data
matrix observations * variables
MultivariateAnomalies.init_UNIV
— Functioninit_UNIV(T::Int, VAR::Int)
init_UNIV{tp}(data::AbstractArray{tp, 2})
initialize a univ_out
object to be used in UNIV!()
either with number of time steps/observations T
and variables VAR
or with a data
matrix observations * variables.
Support Vector Data Description
MultivariateAnomalies.SVDD_train
— FunctionSVDD_train(K, nu)
train a one class support vecort machine model (i.e. support vector data description), given a kernel matrix K and and the highest possible percentage of outliers nu
. Returns the model object (svdd_model
). Requires LIBSVM.
Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.
MultivariateAnomalies.SVDD_predict
— FunctionSVDD_predict(svdd_model, K)
predict the outlierness of an object given the testing Kernel matrix K
and the svdd_model
from SVDD_train(). Requires LIBSVM.
Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20, 1191–1199. Schölkopf, B., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms. Neural Computation, 12, 1207–1245.
Kernel Null Foley Summon Transform
MultivariateAnomalies.KNFST_train
— FunctionKNFST_train(K)
train a one class novelty KNFST model on a Kernel matrix K
according to Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
Output
(proj, targetValue)
proj
– projection vector for data points (project x via kx*proj, where kx is row vector containing kernel values of x and training data) targetValue
– value of all training samples in the null space
MultivariateAnomalies.KNFST_predict
— FunctionKNFST_predict(model, K)
predict the outlierness of some data (represented by the kernel matrix K
), given some KNFST model
from KNFST_train(K)
. Compute K
with kernel_matrix()
.
Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
MultivariateAnomalies.KNFST_predict!
— FunctionKNFST_predict!(KNFST_out, KNFST_mod, K)
predict the outlierness of some data (represented by the kernel matrix K
), given a KNFST_out
object (init_KNFST()
), some KNFST model (KNFST_mod = KNFST_train(K)
) and the testing kernel matrix K.
Paul Bodesheim and Alexander Freytag and Erik Rodner and Michael Kemmler and Joachim Denzler: "Kernel Null Space Methods for Novelty Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
MultivariateAnomalies.init_KNFST
— Functioninit_KNFST(T, KNFST_mod)
initialize a KNFST_out
object for the use with KNFST_predict!
, given T
, the number of observations and the model output KNFST_train(K)
.
Distance to some Centers
MultivariateAnomalies.Dist2Centers
— FunctionDist2Centers(centers::AbstractArray{tp, 2}) where {tp}
Compute the distance to the nearest centers of i.e. a K-means clustering output. Large Distances to the nearest center are anomalies. data: Observations * Variables.
Example
(proj, targetValue)