High Level Anomaly Detection Algorithms
We provide high-level convenience functions for detecting the anomalies. Namely the pair of
P = getParameters(algorithms, training_data) and detectAnomalies(testing_data, P)
sets standard choices of the Parameters P and hands the parameters as well as the algorithms choice over to detect the anomalies.
Currently supported algorithms include Kernel Density Estimation (algorithms = ["KDE"]), Recurrences ("REC"), k-Nearest Neighbors algorithms ("KNN-Gamma", "KNN-Delta"), Hotelling's $T^2$ ("T2"), Support Vector Data Description ("SVDD") and Kernel Null Foley Summon Transform ("KNFST"). With getParameters() it is also possible to compute output scores of multiple algorithms at once (algorihtms = ["KDE", "T2"]), quantiles of the output anomaly scores (quantiles = true) and ensembles of the selected algorithms (e.g. ensemble_method = "mean").
Functions
MultivariateAnomalies.getParameters — FunctiongetParameters(algorithms::Array{String,1} = ["REC", "KDE"], training_data::AbstractArray{tp, 2} = [NaN NaN])return an object of type PARAMS, given the algorithms and some training_data as a matrix.
Arguments
algorithms: Subset of["REC", "KDE", "KNN_Gamma", "KNN_Delta", "SVDD", "KNFST", "T2"]training_data: data for training the algorithms / for getting the Parameters.dist::String = "Euclidean"sigma_quantile::Float64 = 0.5(median): quantile of the distance matrix, used to compute the weighting parameter for the kernel matrix (algorithms = ["SVDD", "KNFST", "KDE"])varepsilon_quantile=sigma_quantileby default: quantile of the distance matrix to compute the radius of the hyperball in which the number of reccurences is counted (algorihtms = ["REC"])k_perc::Float64 = 0.05: percentage of the first dimension oftraining_datato estimmate the number of nearest neighbors (algorithms = ["KNN-Gamma", "KNN_Delta"])nu::Float64 = 0.2: use the maximal percentage of outliers foralgorithms = ["SVDD"]temp_excl::Int64 = 0. Exclude temporal adjacent points from beeing count as recurrences of k-nearest neighborsalgorithms = ["REC", "KNN-Gamma", "KNN_Delta"]ensemble_method = "None": compute an ensemble of the used algorithms. Possible choices (given incompute_ensemble()) are "mean", "median", "max" and "min".quantiles = false: convert the output scores of the algorithms into quantiles.
Examples
julia> using MultivariateAnomalies
julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> P = getParameters(["REC", "KDE", "SVDD"], training_data, quantiles = false);
julia> detectAnomalies(testing_data, P)MultivariateAnomalies.detectAnomalies — FunctiondetectAnomalies(data::AbstractArray{tp, N}, P::PARAMS) where {tp, N}
detectAnomalies(data::AbstractArray{tp, N}, algorithms::Array{String,1} = ["REC", "KDE"]; mean = 0) where {tp, N}detect anomalies, given some Parameter object P of type PARAMS. Train the Parameters P with getParameters() beforehand on some training data. See getParameters(). Without training P beforehand, it is also possible to use detectAnomalies(data, algorithms) given some algorithms (except SVDD, KNFST). Some default parameters are used in this case to initialize P internally.
Examples
julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> # compute the anoamly scores of the algorithms "REC", "KDE", "T2" and "KNN_Gamma", their quantiles and return their ensemble scores
julia> P = getParameters(["REC", "KDE", "T2", "KNN_Gamma"], training_data, quantiles = true, ensemble_method = "mean");
julia> detectAnomalies(testing_data, P)MultivariateAnomalies.detectAnomalies! — FunctiondetectAnomalies!{tp, N}(data::AbstractArray{tp, N}, P::PARAMS)mutating version of detectAnomalies(). Directly writes the output into P.
MultivariateAnomalies.init_detectAnomalies — Functioninit_detectAnomalies{tp, N}(data::AbstractArray{tp, N}, P::PARAMS)initialize empty arrays in P for detecting the anomalies.