Skip to content

Use generalized dot product in Mahalanobis distance. #242

Open
@bvdmitri

Description

@bvdmitri

It would be better to replace dot(z, Q * z) (here https://github.com/JuliaStats/Distances.jl/blob/master/src/mahalanobis.jl#L83 and in other places) with dot(z, Q, z) without storing the intermediate result of Q*z. The problem here is that Q * z allocates and sometimes this allocation might slowdown the overall execution by a lot.

This is the small benchmark:

julia> @btime dot($x, $Q*$x)
  87.190 ns (1 allocation: 80 bytes)
0.850287710619438

julia> @btime dot($x, $Q, $x)
  13.774 ns (0 allocations: 0 bytes)
0.8502877106194378

Another nice addition would be to provide a tmp storage for inplace z = a - b operation (which also allocates). That would help if some executes the mahalanobis distance in a tight for loop.

In our application, simply by changing the dot and providing tmp storage we reduced allocations from 1.107791 seconds (20.21 M allocations: 1.141 GiB, 18.80% gc time) to 0.532219 seconds (10.12 M allocations: 393.585 MiB, 17.05% gc time).

This is the sketch of our implementation (without checks):

struct FastSqMahalanobis{ M <: AbstractMatrix, V <: AbstractVector }
    Q :: M
    z :: V
end

function FastSqMahalanobis(Q::AbstractMatrix) 
    return FastSqMahalanobis(Q, zeros(eltype(Q), size(Q, 1)))
end

function (distance::FastSqMahalanobis)(a::AbstractVector, b::AbstractVector)
    if length(a) != length(b)
        throw(DimensionMismatch("first array has length $(length(a)) which does not match the length of the second, $(length(b))."))
    end

    Q = distance.Q
    z = distance.z

    # inplace z = a - b
    map!(-, z, a, b)

    return dot(z, Q, z)
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions