Feature Extraction Techniques
Extract the relevant inforamtion out of your data and use them as input feature for the anomaly detection algorithms.
Dimensionality Reduction
For dimensionality reduction, we would like to point to the package MultivariateStats.jl. Several techniques are implemented there, e.g.
- Principal Component Analysis (PCA)
- Independent Component Analysis (ICA)
Seasonality
When dealing with time series, i.e. the observations are time steps, it might be important to remove or get robust estimates of the mean seasonal cycles. This is implemended by
- subtracting the median seasonal cycle (sMSC) and
- getting the median seasonal cycle (get_MedianCycles)
Functions
MultivariateAnomalies.sMSC
— FunctionsMSC(datacube, cycle_length)
subtract the median seasonal cycle from the datacube given the length of year cycle_length
.
Examples
julia> dc = hcat(rand(193) + 2* sin.(0:pi/24:8*pi), rand(193) + 2* sin.(0:pi/24:8*pi))
julia> sMSC_dc = sMSC(dc, 48)
MultivariateAnomalies.get_MedianCycles
— Functionget_MedianCycles(datacube, cycle_length::Int = 46)
returns the median annual cycle of a datacube, given the length of the annual cycle (presetting: cycle_length = 46
). The datacube can be 2, 3, 4-dimensional, time is stored along the first dimension.
Examples
julia> using MultivariateAnomalies
julia> dc = hcat(rand(193) + 2* sin(0:pi/24:8*pi), rand(193) + 2* sin(0:pi/24:8*pi))
julia> cycles = get_MedianCycles(dc, 48)
MultivariateAnomalies.get_MedianCycle
— Functionget_MedianCycle(dat::Array{tp,1}, cycle_length::Int = 46)
returns the median annual cycle of a one dimensional data array, given the length of the annual cycle (presetting: cycle_length = 46
). Can deal with some NaN values.
Examples
julia> using MultivariateAnomalies
julia> dat = randn(90) + x = sind.(0:8:719)
julia> cycles = get_MedianCycle(dat, 48)
MultivariateAnomalies.get_MedianCycle!
— Functionget_MedianCycle!(init_MC, dat::Array{tp,1})
Memory efficient version of get_MedianCycle()
, returning the median cycle in init_MC[3]
. The init_MC
object should be created with init_MedianCycle
. Can deal with some NaN values.
Examples
julia> using MultivariateAnomalies
julia> dat = rand(193) + 2* sin(0:pi/24:8*pi)
julia> dat[100] = NaN
julia> init_MC = init_MedianCycle(dat, 48)
julia> get_MedianCycle!(init_MC, dat)
julia> init_MC[3]
MultivariateAnomalies.init_MedianCycle
— Functioninit_MedianCycle(dat::Array{tp}, cycle_length::Int = 46)
init_MedianCycle(temporal_length::Int[, cycle_length::Int = 46])
initialises an initMC object to be used as input for `getMedianCycle!(). Input is either some sample data or the temporal lenght of the expected input vector and the length of the annual cycle (presetting:
cycle_length = 46`)
Exponential Weighted Moving Average
One option to reduce the noise level in the data and detect more 'significant' anomalies is computing an exponential weighted moving average (EWMA)
Function
MultivariateAnomalies.EWMA
— FunctionEWMA(dat, λ)
Compute the exponential weighted moving average (EWMA) with the weighting parameter λ
between 0 (full weighting) and 1 (no weighting) along the first dimension of dat
. Supports N-dimensional Arrays.
Lowry, C. A., & Woodall, W. H. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46–53.
Examples
julia> dc = rand(100,3,2)
julia> ewma_dc = EWMA(dc, 0.1)
MultivariateAnomalies.EWMA!
— FunctionEWMA!(Z, dat, λ)
use a preallocated output Z. Z = similar(dat)
or dat = dat
for overwriting itself.
Examples
julia> dc = rand(100,3,2)
julia> EWMA!(dc, dc, 0.1)
Time Delay Embedding
Increase the feature space (Variabales) with lagged observations.
Function
MultivariateAnomalies.TDE
— FunctionTDE(datacube::Array{tp, 4}, ΔT::Integer, DIM::Int = 3) where {tp}
TDE(datacube::Array{tp, 3}, ΔT::Integer, DIM::Int = 3) where {tp}
returns an embedded datacube by concatenating lagged versions of the 2-, 3- or 4-dimensional datacube with ΔT
time steps in the past up to dimension DIM
(presetting: DIM = 3
)
Examples
julia> dc = randn(50,3)
julia> TDE(dc, 3, 2)
Moving Window Features
include the variance (mwVAR) and correlations (mwCOR) in a moving window along the first dimension of the data.
Functions
MultivariateAnomalies.mw_VAR
— Functionmw_VAR(datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}
compute the variance in a moving window along the first dimension of the datacube (presetting: windowsize = 10
). Accepts N dimensional datacubes.
Examples
julia> dc = randn(50,3,3,3)
julia> mw_VAR(dc, 15)
MultivariateAnomalies.mw_VAR!
— Functionmw_VAR!(out::Array{tp, N}, datacube0mean::Array{tp,N}, windowsize::Int = 10) where {tp,N}
mutating version for mw_VAR()
. The mean of the input data datacube0mean
has to be 0. Initialize out properly: out = datacube0mean
leads to wrong results.
MultivariateAnomalies.mw_COR
— Functionmw_COR(datacube::Array{tp, 4}, windowsize::Int = 10) where {tp}
compute the correlation in a moving window along the first dimension of the datacube (presetting: windowsize = 10
). Accepts 4-dimensional datacubes.
MultivariateAnomalies.mw_AVG
— Functionmw_AVG(datacube::AbstractArray{tp,N}, windowsize::Int = 10) where {tp,N}
compute the average in a moving window along the first dimension of the datacube (presetting: windowsize = 10
). Accepts N dimensional datacubes.
Examples
julia> dc = randn(50,3,3,3)
julia> mw_AVG(dc, 15)
MultivariateAnomalies.mw_AVG!
— Functionmw_AVG!(out::Array{tp, N}, datacube::Array{tp,N}, windowsize::Int = 10) where {tp,N}
internal and mutating version for mw_AVG()
.
Index
MultivariateAnomalies.EWMA
MultivariateAnomalies.EWMA!
MultivariateAnomalies.TDE
MultivariateAnomalies.get_MedianCycle
MultivariateAnomalies.get_MedianCycle!
MultivariateAnomalies.get_MedianCycles
MultivariateAnomalies.init_MedianCycle
MultivariateAnomalies.mw_AVG
MultivariateAnomalies.mw_AVG!
MultivariateAnomalies.mw_COR
MultivariateAnomalies.mw_VAR
MultivariateAnomalies.mw_VAR!
MultivariateAnomalies.sMSC