High Level Anomaly Detection Algorithms

We provide high-level convenience functions for detecting the anomalies. Namely the pair of

P = getParameters(algorithms, training_data) and detectAnomalies(testing_data, P)

sets standard choices of the Parameters P and hands the parameters as well as the algorithms choice over to detect the anomalies.

Currently supported algorithms include Kernel Density Estimation (algorithms = ["KDE"]), Recurrences ("REC"), k-Nearest Neighbors algorithms ("KNN-Gamma", "KNN-Delta"), Hotelling's $T^2$ ("T2"), Support Vector Data Description ("SVDD") and Kernel Null Foley Summon Transform ("KNFST"). With getParameters() it is also possible to compute output scores of multiple algorithms at once (algorihtms = ["KDE", "T2"]), quantiles of the output anomaly scores (quantiles = true) and ensembles of the selected algorithms (e.g. ensemble_method = "mean").

Functions

MultivariateAnomalies.getParametersFunction
getParameters(algorithms::Array{String,1} = ["REC", "KDE"], training_data::AbstractArray{tp, 2} = [NaN NaN])

return an object of type PARAMS, given the algorithms and some training_data as a matrix.

Arguments

  • algorithms: Subset of ["REC", "KDE", "KNN_Gamma", "KNN_Delta", "SVDD", "KNFST", "T2"]
  • training_data: data for training the algorithms / for getting the Parameters.
  • dist::String = "Euclidean"
  • sigma_quantile::Float64 = 0.5 (median): quantile of the distance matrix, used to compute the weighting parameter for the kernel matrix (algorithms = ["SVDD", "KNFST", "KDE"])
  • varepsilon_quantile = sigma_quantile by default: quantile of the distance matrix to compute the radius of the hyperball in which the number of reccurences is counted (algorihtms = ["REC"])
  • k_perc::Float64 = 0.05: percentage of the first dimension of training_data to estimmate the number of nearest neighbors (algorithms = ["KNN-Gamma", "KNN_Delta"])
  • nu::Float64 = 0.2: use the maximal percentage of outliers for algorithms = ["SVDD"]
  • temp_excl::Int64 = 0. Exclude temporal adjacent points from beeing count as recurrences of k-nearest neighbors algorithms = ["REC", "KNN-Gamma", "KNN_Delta"]
  • ensemble_method = "None": compute an ensemble of the used algorithms. Possible choices (given in compute_ensemble()) are "mean", "median", "max" and "min".
  • quantiles = false: convert the output scores of the algorithms into quantiles.

Examples

julia> using MultivariateAnomalies
julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> P = getParameters(["REC", "KDE", "SVDD"], training_data, quantiles = false);
julia> detectAnomalies(testing_data, P)
source
MultivariateAnomalies.detectAnomaliesFunction
detectAnomalies(data::AbstractArray{tp, N}, P::PARAMS) where {tp, N}
detectAnomalies(data::AbstractArray{tp, N}, algorithms::Array{String,1} = ["REC", "KDE"]; mean = 0) where {tp, N}

detect anomalies, given some Parameter object P of type PARAMS. Train the Parameters P with getParameters() beforehand on some training data. See getParameters(). Without training P beforehand, it is also possible to use detectAnomalies(data, algorithms) given some algorithms (except SVDD, KNFST). Some default parameters are used in this case to initialize P internally.

Examples

julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> # compute the anoamly scores of the algorithms "REC", "KDE", "T2" and "KNN_Gamma", their quantiles and return their ensemble scores
julia> P = getParameters(["REC", "KDE", "T2", "KNN_Gamma"], training_data, quantiles = true, ensemble_method = "mean");
julia> detectAnomalies(testing_data, P)
source

Index