Description
It would be better to replace dot(z, Q * z)
(here https://github.com/JuliaStats/Distances.jl/blob/master/src/mahalanobis.jl#L83 and in other places) with dot(z, Q, z)
without storing the intermediate result of Q*z. The problem here is that Q * z
allocates and sometimes this allocation might slowdown the overall execution by a lot.
This is the small benchmark:
julia> @btime dot($x, $Q*$x)
87.190 ns (1 allocation: 80 bytes)
0.850287710619438
julia> @btime dot($x, $Q, $x)
13.774 ns (0 allocations: 0 bytes)
0.8502877106194378
Another nice addition would be to provide a tmp
storage for inplace z = a - b
operation (which also allocates). That would help if some executes the mahalanobis distance in a tight for loop.
In our application, simply by changing the dot
and providing tmp storage we reduced allocations from 1.107791 seconds (20.21 M allocations: 1.141 GiB, 18.80% gc time)
to 0.532219 seconds (10.12 M allocations: 393.585 MiB, 17.05% gc time)
.
This is the sketch of our implementation (without checks):
struct FastSqMahalanobis{ M <: AbstractMatrix, V <: AbstractVector }
Q :: M
z :: V
end
function FastSqMahalanobis(Q::AbstractMatrix)
return FastSqMahalanobis(Q, zeros(eltype(Q), size(Q, 1)))
end
function (distance::FastSqMahalanobis)(a::AbstractVector, b::AbstractVector)
if length(a) != length(b)
throw(DimensionMismatch("first array has length $(length(a)) which does not match the length of the second, $(length(b))."))
end
Q = distance.Q
z = distance.z
# inplace z = a - b
map!(-, z, a, b)
return dot(z, Q, z)
end