Preprocessing

Feature Extraction Techniques

Extract the relevant inforamtion out of your data and use them as input feature for the anomaly detection algorithms.

Dimensionality Reduction

For dimensionality reduction, we would like to point to the package MultivariateStats.jl. Several techniques are implemented there, e.g.

Seasonality

When dealing with time series, i.e. the observations are time steps, it might be important to remove or get robust estimates of the mean seasonal cycles. This is implemended by

Functions

sMSC(datacube, cycle_length)

subtract the median seasonal cycle from the datacube given the length of year cycle_length.

Examples

julia> dc = hcat(rand(193) + 2* sin.(0:pi/24:8*pi), rand(193) + 2* sin.(0:pi/24:8*pi))
julia> sMSC_dc = sMSC(dc, 48)
source
get_MedianCycles(datacube, cycle_length::Int = 46)

returns the median annual cycle of a datacube, given the length of the annual cycle (presetting: cycle_length = 46). The datacube can be 2, 3, 4-dimensional, time is stored along the first dimension.

Examples

julia> using MultivariateAnomalies
julia> dc = hcat(rand(193) + 2* sin(0:pi/24:8*pi), rand(193) + 2* sin(0:pi/24:8*pi))
julia> cycles = get_MedianCycles(dc, 48)
source
get_MedianCycle(dat::Array{tp,1}, cycle_length::Int = 46)

returns the median annual cycle of a one dimensional data array, given the length of the annual cycle (presetting: cycle_length = 46). Can deal with some NaN values.

Examples

julia> using MultivariateAnomalies
julia> dat = randn(90) + x = sind.(0:8:719)
julia> cycles = get_MedianCycle(dat, 48)
source
get_MedianCycle!(init_MC, dat::Array{tp,1})

Memory efficient version of get_MedianCycle(), returning the median cycle in init_MC[3]. The init_MC object should be created with init_MedianCycle. Can deal with some NaN values.

Examples

julia> using MultivariateAnomalies
julia> dat = rand(193) + 2* sin(0:pi/24:8*pi)
julia> dat[100] = NaN
julia> init_MC = init_MedianCycle(dat, 48)
julia> get_MedianCycle!(init_MC, dat)
julia> init_MC[3]
source
init_MedianCycle(dat::Array{tp}, cycle_length::Int = 46)
init_MedianCycle(temporal_length::Int[, cycle_length::Int = 46])

initialises an initMC object to be used as input for `getMedianCycle!(). Input is either some sample data or the temporal lenght of the expected input vector and the length of the annual cycle (presetting:cycle_length = 46`)

source

Exponential Weighted Moving Average

One option to reduce the noise level in the data and detect more 'significant' anomalies is computing an exponential weighted moving average (EWMA)

Function

EWMA(dat,  λ)

Compute the exponential weighted moving average (EWMA) with the weighting parameter λ between 0 (full weighting) and 1 (no weighting) along the first dimension of dat. Supports N-dimensional Arrays.

Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.

Examples

julia> dc = rand(100,3,2)
julia> ewma_dc = EWMA(dc, 0.1)
source
EWMA!(Z, dat,  λ)

use a preallocated output Z. Z = similar(dat) or dat = dat for overwriting itself.

Examples

julia> dc = rand(100,3,2)
julia> EWMA!(dc, dc, 0.1)
source

Time Delay Embedding

Increase the feature space (Variabales) with lagged observations.

Function

TDE(datacube::Array{tp, 4}, ΔT::Integer, DIM::Int = 3) where {tp}
TDE(datacube::Array{tp, 3}, ΔT::Integer, DIM::Int = 3) where {tp}

returns an embedded datacube by concatenating lagged versions of the 2-, 3- or 4-dimensional datacube with ΔT time steps in the past up to dimension DIM (presetting: DIM = 3)

Examples

julia> dc = randn(50,3)
julia> TDE(dc, 3, 2)
source

Moving Window Features

include the variance (mwVAR) and correlations (mwCOR) in a moving window along the first dimension of the data.

Functions

mw_VAR(datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}

compute the variance in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts N dimensional datacubes.

Examples

julia> dc = randn(50,3,3,3)
julia> mw_VAR(dc, 15)
source
mw_VAR!(out::Array{tp, N}, datacube0mean::Array{tp,N}, windowsize::Int = 10) where {tp,N}

mutating version for mw_VAR(). The mean of the input data datacube0mean has to be 0. Initialize out properly: out = datacube0mean leads to wrong results.

source
mw_COR(datacube::Array{tp, 4}, windowsize::Int = 10) where {tp}

compute the correlation in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts 4-dimensional datacubes.

source
mw_AVG(datacube::AbstractArray{tp,N}, windowsize::Int = 10) where {tp,N}

compute the average in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts N dimensional datacubes.

Examples

julia> dc = randn(50,3,3,3)
julia> mw_AVG(dc, 15)
source
mw_AVG!(out::Array{tp, N}, datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}

internal and mutating version for mw_AVG().

source

Index