Distance, Kernel Matrices and k-Nearest Neighbours

Compute distance matrices (similarity matrices) and convert them into kernel matrices or k-nearest neighbor objects.

Distance/Similarity Matrices

A distance matrix D consists of pairwise distances $d()$computed with some metrix (e.g. Euclidean):

\[D = d(X_{t_i}, X_{t_j})\]

i.e. the distance between vector $X$ of observation $t_i$ and $t_j$ for all observations $t_i,t_j = 1 \ldots T$.

Functions

MultivariateAnomalies.dist_matrixFunction
dist_matrix(data::AbstractArray{tp, N}; dist::String = "Euclidean", space::Int = 0, lat::Int = 0, lon::Int = 0, Q = 0) where {tp, N}
dist_matrix(data::AbstractArray{tp, N}, training_data; dist::String = "Euclidean", space::Int = 0, lat::Int = 0, lon::Int = 0, Q = 0) where {tp, N}

compute the distance matrix of data or the distance matrix between data and training data i.e. the pairwise distances along the first dimension of data, using the last dimension as variables. dist is a distance metric, currently Euclidean(default), SqEuclidean, Chebyshev, Cityblock, JSDivergence, Mahalanobis and SqMahalanobis are supported. The latter two need a covariance matrix Q as input argument.

Examples

julia> dc = randn(10, 4,3)
julia> D = dist_matrix(dc, space = 2)
source
MultivariateAnomalies.dist_matrix!Function
dist_matrix!(D_out, data, ...)

compute the distance matrix of data, similar to dist_matrix(). D_out object has to be preallocated, i.e. with init_dist_matrix.

Examples

julia> dc = randn(10,4, 4,3)
julia> D_out = init_dist_matrix(dc)
julia> dist_matrix!(D_out, dc, lat = 2, lon = 2)
julia> D_out[1]
source

k-Nearest Neighbor Objects

k-Nearest Neighbor objects return the k nearest points and their distance out of a distance matrix D.

Functions

MultivariateAnomalies.knn_distsFunction
knn_dists(D, k::Int, temp_excl::Int = 0)

returns the k-nearest neighbors of a distance matrix D. Excludes temp_excl (default: temp_excl = 0) distances from the main diagonal of D to be also nearest neighbors.

Examples

julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> knn_dists_out = knn_dists(D, 3, 1)
julia> knn_dists_out[5] # distances
julia> knn_dists_out[4] # indices
source
MultivariateAnomalies.knn_dists!Function
knn_dists!(knn_dists_out, D, temp_excl::Int = 0)

returns the k-nearest neighbors of a distance matrix D. Similar to knn_dists(), but uses preallocated input object knn_dists_out, initialized with init_knn_dists(). Please note that the number of nearest neighbors k is not necessary, as it is already determined by the knn_dists_out object.

Examples

julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> knn_dists_out = init_knn_dists(dc, 3)
julia> knn_dists!(knn_dists_out, D)
julia> knn_dists_out[5] # distances
julia> knn_dists_out[4] # indices
source
MultivariateAnomalies.init_knn_distsFunction
init_knn_dists(T::Int, k::Int)
init_knn_dists(datacube::AbstractArray, k::Int)

initialize a preallocated knn_dists_out object. kis the number of nerarest neighbors, T the number of time steps (i.e. size of the first dimension) or a multidimensional datacube.

source

Kernel Matrices (Dissimilarities)

A distance matrix D can be converted into a kernel matrix K, i.e. by computing pairwise dissimilarities using Gaussian kernels centered on each datapoint.

\[K= exp(-0.5 \cdot D \cdot \sigma^{-2})\]

Functions

MultivariateAnomalies.kernel_matrixFunction
kernel_matrix(D::AbstractArray, σ::Float64 = 1.0[, kernel::String = "gauss", dimension::Int64 = 1])

compute a kernel matrix out of distance matrix D, given σ. Optionally normalized by the dimension, if kernel = "normalized_gauss". compute D with dist_matrix().

Examples

julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> K = kernel_matrix(D, 2.0)
source
MultivariateAnomalies.kernel_matrix!Function
kernel_matrix!(K, D::AbstractArray, σ::Float64 = 1.0[, kernel::String = "gauss", dimension::Int64 = 1])

compute a kernel matrix out of distance matrix D. Similar to kernel_matrix(), but with preallocated Array K (K = similar(D)) for output.

Examples

julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> kernel_matrix!(D, D, 2.0) # overwrites distance matrix
source

Index