Distance, Kernel Matrices and k-Nearest Neighbours
Compute distance matrices (similarity matrices) and convert them into kernel matrices or k-nearest neighbor objects.
Distance/Similarity Matrices
A distance matrix D
consists of pairwise distances $d()$computed with some metrix (e.g. Euclidean):
i.e. the distance between vector $X$ of observation $t_i$ and $t_j$ for all observations $t_i,t_j = 1 \ldots T$.
Functions
MultivariateAnomalies.dist_matrix
— Functiondist_matrix(data::AbstractArray{tp, N}; dist::String = "Euclidean", space::Int = 0, lat::Int = 0, lon::Int = 0, Q = 0) where {tp, N}
dist_matrix(data::AbstractArray{tp, N}, training_data; dist::String = "Euclidean", space::Int = 0, lat::Int = 0, lon::Int = 0, Q = 0) where {tp, N}
compute the distance matrix of data
or the distance matrix between data and training data i.e. the pairwise distances along the first dimension of data, using the last dimension as variables. dist
is a distance metric, currently Euclidean
(default), SqEuclidean
, Chebyshev
, Cityblock
, JSDivergence
, Mahalanobis
and SqMahalanobis
are supported. The latter two need a covariance matrix Q
as input argument.
Examples
julia> dc = randn(10, 4,3)
julia> D = dist_matrix(dc, space = 2)
MultivariateAnomalies.dist_matrix!
— Functiondist_matrix!(D_out, data, ...)
compute the distance matrix of data
, similar to dist_matrix()
. D_out
object has to be preallocated, i.e. with init_dist_matrix
.
Examples
julia> dc = randn(10,4, 4,3)
julia> D_out = init_dist_matrix(dc)
julia> dist_matrix!(D_out, dc, lat = 2, lon = 2)
julia> D_out[1]
MultivariateAnomalies.init_dist_matrix
— Functioninit_dist_matrix(data)
init_dist_matrix(data, training_data)
initialize a D_out
object for dist_matrix!()
.
k-Nearest Neighbor Objects
k-Nearest Neighbor objects return the k nearest points and their distance out of a distance matrix D
.
Functions
MultivariateAnomalies.knn_dists
— Functionknn_dists(D, k::Int, temp_excl::Int = 0)
returns the k-nearest neighbors of a distance matrix D
. Excludes temp_excl
(default: temp_excl = 0
) distances from the main diagonal of D
to be also nearest neighbors.
Examples
julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> knn_dists_out = knn_dists(D, 3, 1)
julia> knn_dists_out[5] # distances
julia> knn_dists_out[4] # indices
MultivariateAnomalies.knn_dists!
— Functionknn_dists!(knn_dists_out, D, temp_excl::Int = 0)
returns the k-nearest neighbors of a distance matrix D
. Similar to knn_dists()
, but uses preallocated input object knn_dists_out
, initialized with init_knn_dists()
. Please note that the number of nearest neighbors k
is not necessary, as it is already determined by the knn_dists_out
object.
Examples
julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> knn_dists_out = init_knn_dists(dc, 3)
julia> knn_dists!(knn_dists_out, D)
julia> knn_dists_out[5] # distances
julia> knn_dists_out[4] # indices
MultivariateAnomalies.init_knn_dists
— Functioninit_knn_dists(T::Int, k::Int)
init_knn_dists(datacube::AbstractArray, k::Int)
initialize a preallocated knn_dists_out
object. k
is the number of nerarest neighbors, T
the number of time steps (i.e. size of the first dimension) or a multidimensional datacube
.
Kernel Matrices (Dissimilarities)
A distance matrix D
can be converted into a kernel matrix K
, i.e. by computing pairwise dissimilarities using Gaussian kernels centered on each datapoint.
Functions
MultivariateAnomalies.kernel_matrix
— Functionkernel_matrix(D::AbstractArray, σ::Float64 = 1.0[, kernel::String = "gauss", dimension::Int64 = 1])
compute a kernel matrix out of distance matrix D
, given σ
. Optionally normalized by the dimension
, if kernel = "normalized_gauss"
. compute D
with dist_matrix()
.
Examples
julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> K = kernel_matrix(D, 2.0)
MultivariateAnomalies.kernel_matrix!
— Functionkernel_matrix!(K, D::AbstractArray, σ::Float64 = 1.0[, kernel::String = "gauss", dimension::Int64 = 1])
compute a kernel matrix out of distance matrix D
. Similar to kernel_matrix()
, but with preallocated Array K (K = similar(D)
) for output.
Examples
julia> dc = randn(20, 4,3)
julia> D = dist_matrix(dc, space = 2)
julia> kernel_matrix!(D, D, 2.0) # overwrites distance matrix