High Level Anomaly Detection Algorithms
We provide high-level convenience functions for detecting the anomalies. Namely the pair of
P = getParameters(algorithms, training_data)
and detectAnomalies(testing_data, P)
sets standard choices of the Parameters P
and hands the parameters as well as the algorithms choice over to detect the anomalies.
Currently supported algorithms include Kernel Density Estimation (algorithms = ["KDE"]
), Recurrences ("REC"
), k-Nearest Neighbors algorithms ("KNN-Gamma"
, "KNN-Delta"
), Hotelling's $T^2$ ("T2"
), Support Vector Data Description ("SVDD"
) and Kernel Null Foley Summon Transform ("KNFST"
). With getParameters()
it is also possible to compute output scores of multiple algorithms at once (algorihtms = ["KDE", "T2"]
), quantiles of the output anomaly scores (quantiles = true
) and ensembles of the selected algorithms (e.g. ensemble_method = "mean"
).
Functions
MultivariateAnomalies.getParameters
— FunctiongetParameters(algorithms::Array{String,1} = ["REC", "KDE"], training_data::AbstractArray{tp, 2} = [NaN NaN])
return an object of type PARAMS, given the algorithms
and some training_data
as a matrix.
Arguments
algorithms
: Subset of["REC", "KDE", "KNN_Gamma", "KNN_Delta", "SVDD", "KNFST", "T2"]
training_data
: data for training the algorithms / for getting the Parameters.dist::String = "Euclidean"
sigma_quantile::Float64 = 0.5
(median): quantile of the distance matrix, used to compute the weighting parameter for the kernel matrix (algorithms = ["SVDD", "KNFST", "KDE"]
)varepsilon_quantile
=sigma_quantile
by default: quantile of the distance matrix to compute the radius of the hyperball in which the number of reccurences is counted (algorihtms = ["REC"]
)k_perc::Float64 = 0.05
: percentage of the first dimension oftraining_data
to estimmate the number of nearest neighbors (algorithms = ["KNN-Gamma", "KNN_Delta"]
)nu::Float64 = 0.2
: use the maximal percentage of outliers foralgorithms = ["SVDD"]
temp_excl::Int64 = 0
. Exclude temporal adjacent points from beeing count as recurrences of k-nearest neighborsalgorithms = ["REC", "KNN-Gamma", "KNN_Delta"]
ensemble_method = "None"
: compute an ensemble of the used algorithms. Possible choices (given incompute_ensemble()
) are "mean", "median", "max" and "min".quantiles = false
: convert the output scores of the algorithms into quantiles.
Examples
julia> using MultivariateAnomalies
julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> P = getParameters(["REC", "KDE", "SVDD"], training_data, quantiles = false);
julia> detectAnomalies(testing_data, P)
MultivariateAnomalies.detectAnomalies
— FunctiondetectAnomalies(data::AbstractArray{tp, N}, P::PARAMS) where {tp, N}
detectAnomalies(data::AbstractArray{tp, N}, algorithms::Array{String,1} = ["REC", "KDE"]; mean = 0) where {tp, N}
detect anomalies, given some Parameter object P
of type PARAMS. Train the Parameters P
with getParameters()
beforehand on some training data. See getParameters()
. Without training P
beforehand, it is also possible to use detectAnomalies(data, algorithms)
given some algorithms (except SVDD, KNFST). Some default parameters are used in this case to initialize P
internally.
Examples
julia> training_data = randn(100, 2); testing_data = randn(100, 2);
julia> # compute the anoamly scores of the algorithms "REC", "KDE", "T2" and "KNN_Gamma", their quantiles and return their ensemble scores
julia> P = getParameters(["REC", "KDE", "T2", "KNN_Gamma"], training_data, quantiles = true, ensemble_method = "mean");
julia> detectAnomalies(testing_data, P)
MultivariateAnomalies.detectAnomalies!
— FunctiondetectAnomalies!{tp, N}(data::AbstractArray{tp, N}, P::PARAMS)
mutating version of detectAnomalies()
. Directly writes the output into P
.
MultivariateAnomalies.init_detectAnomalies
— Functioninit_detectAnomalies{tp, N}(data::AbstractArray{tp, N}, P::PARAMS)
initialize empty arrays in P
for detecting the anomalies.