Feature Extraction Techniques

Extract the relevant inforamtion out of your data and use them as input feature for the anomaly detection algorithms.

Dimensionality Reduction

For dimensionality reduction, we would like to point to the package MultivariateStats.jl. Several techniques are implemented there, e.g.

  • Principal Component Analysis (PCA)
  • Independent Component Analysis (ICA)

Seasonality

When dealing with time series, i.e. the observations are time steps, it might be important to remove or get robust estimates of the mean seasonal cycles. This is implemended by

  • subtracting the median seasonal cycle (sMSC) and
  • getting the median seasonal cycle (get_MedianCycles)

Functions

MultivariateAnomalies.sMSCFunction
sMSC(datacube, cycle_length)

subtract the median seasonal cycle from the datacube given the length of year cycle_length.

Examples

julia> dc = hcat(rand(193) + 2* sin.(0:pi/24:8*pi), rand(193) + 2* sin.(0:pi/24:8*pi))
julia> sMSC_dc = sMSC(dc, 48)
source
MultivariateAnomalies.get_MedianCyclesFunction
get_MedianCycles(datacube, cycle_length::Int = 46)

returns the median annual cycle of a datacube, given the length of the annual cycle (presetting: cycle_length = 46). The datacube can be 2, 3, 4-dimensional, time is stored along the first dimension.

Examples

julia> using MultivariateAnomalies
julia> dc = hcat(rand(193) + 2* sin(0:pi/24:8*pi), rand(193) + 2* sin(0:pi/24:8*pi))
julia> cycles = get_MedianCycles(dc, 48)
source
MultivariateAnomalies.get_MedianCycleFunction
get_MedianCycle(dat::Array{tp,1}, cycle_length::Int = 46)

returns the median annual cycle of a one dimensional data array, given the length of the annual cycle (presetting: cycle_length = 46). Can deal with some NaN values.

Examples

julia> using MultivariateAnomalies
julia> dat = randn(90) + x = sind.(0:8:719)
julia> cycles = get_MedianCycle(dat, 48)
source
MultivariateAnomalies.get_MedianCycle!Function
get_MedianCycle!(init_MC, dat::Array{tp,1})

Memory efficient version of get_MedianCycle(), returning the median cycle in init_MC[3]. The init_MC object should be created with init_MedianCycle. Can deal with some NaN values.

Examples

julia> using MultivariateAnomalies
julia> dat = rand(193) + 2* sin(0:pi/24:8*pi)
julia> dat[100] = NaN
julia> init_MC = init_MedianCycle(dat, 48)
julia> get_MedianCycle!(init_MC, dat)
julia> init_MC[3]
source
MultivariateAnomalies.init_MedianCycleFunction
init_MedianCycle(dat::Array{tp}, cycle_length::Int = 46)
init_MedianCycle(temporal_length::Int[, cycle_length::Int = 46])

initialises an initMC object to be used as input for `getMedianCycle!(). Input is either some sample data or the temporal lenght of the expected input vector and the length of the annual cycle (presetting:cycle_length = 46`)

source

Exponential Weighted Moving Average

One option to reduce the noise level in the data and detect more 'significant' anomalies is computing an exponential weighted moving average (EWMA)

Function

MultivariateAnomalies.EWMAFunction
EWMA(dat,  λ)

Compute the exponential weighted moving average (EWMA) with the weighting parameter λ between 0 (full weighting) and 1 (no weighting) along the first dimension of dat. Supports N-dimensional Arrays.

Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.

Examples

julia> dc = rand(100,3,2)
julia> ewma_dc = EWMA(dc, 0.1)
source
MultivariateAnomalies.EWMA!Function
EWMA!(Z, dat,  λ)

use a preallocated output Z. Z = similar(dat) or dat = dat for overwriting itself.

Examples

julia> dc = rand(100,3,2)
julia> EWMA!(dc, dc, 0.1)
source

Time Delay Embedding

Increase the feature space (Variabales) with lagged observations.

Function

MultivariateAnomalies.TDEFunction
TDE(datacube::Array{tp, 4}, ΔT::Integer, DIM::Int = 3) where {tp}
TDE(datacube::Array{tp, 3}, ΔT::Integer, DIM::Int = 3) where {tp}

returns an embedded datacube by concatenating lagged versions of the 2-, 3- or 4-dimensional datacube with ΔT time steps in the past up to dimension DIM (presetting: DIM = 3)

Examples

julia> dc = randn(50,3)
julia> TDE(dc, 3, 2)
source

Moving Window Features

include the variance (mwVAR) and correlations (mwCOR) in a moving window along the first dimension of the data.

Functions

MultivariateAnomalies.mw_VARFunction
mw_VAR(datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}

compute the variance in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts N dimensional datacubes.

Examples

julia> dc = randn(50,3,3,3)
julia> mw_VAR(dc, 15)
source
MultivariateAnomalies.mw_VAR!Function
mw_VAR!(out::Array{tp, N}, datacube0mean::Array{tp,N}, windowsize::Int = 10) where {tp,N}

mutating version for mw_VAR(). The mean of the input data datacube0mean has to be 0. Initialize out properly: out = datacube0mean leads to wrong results.

source
MultivariateAnomalies.mw_CORFunction
mw_COR(datacube::Array{tp, 4}, windowsize::Int = 10) where {tp}

compute the correlation in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts 4-dimensional datacubes.

source
MultivariateAnomalies.mw_AVGFunction
mw_AVG(datacube::AbstractArray{tp,N}, windowsize::Int = 10) where {tp,N}

compute the average in a moving window along the first dimension of the datacube (presetting: windowsize = 10). Accepts N dimensional datacubes.

Examples

julia> dc = randn(50,3,3,3)
julia> mw_AVG(dc, 15)
source
MultivariateAnomalies.mw_AVG!Function
mw_AVG!(out::Array{tp, N}, datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}

internal and mutating version for mw_AVG().

source

Index