-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Currently the pipeline for MDS is X -> D -> G -> M
, where X
is a coordinate representation of the data, D
is the pairwise distance matrix, G
is the Gram matrix, and M
is the final MDS model. One can alternatively go D -> G -> M
by supplying the distances=true
keyword. However, there are applications where the natural thing to supply is G
. For example, if you are working with objects for which there is a natural inner product, and you want to visualize these objects in a lower-dimensional space, you're going to first compute G
, and it would be quite silly to use fit(MDS, gram2dist(G); distances=true)
when the first thing that will happen inside fit
is to convert the distances back into the Gram matrix.
I know there isn't a ton of stuff inside fit
that happens after you have G
, but from the standpoint of keeping things in sync I think it would be best to expose this to the user instead of forcing them to create their own private version of fit
.
The only hard problem is deciding the API. I think a good design would be
struct Distances{M<:AbstractMatrix}
D::M
end
struct Gramian{M<:AbstractMatrix}
G::M
end
fit(::Type{MDS}, X::AbstractMatrix) = fit(MDS, Distances(L2distance(X)))
fit(::Type{MDS}, D::Distances) = fit(MDS, Gramian(dmat2gram(D.D)))
function fit(::Type{MDS}, G::Gramian)
# the "main" implementation
end
Then users call one of:
fit(MDS, X)
fit(MDS, Distances(D))
fit(MSD, Gramian(G))
depending on what kind of starting point they have.
One alternative approach is to use a keyword argument, but here we would run into a compatibility problem: if we add gramian::Bool=false
as a keyword, then we have to look for the case where distances == gramian == true
and throw an error. This is not a big deal but it feels a bit ugly. Alternatively we could add interpretation=:coordinates
and support the settings :distances
and :gramian
but that would be a breaking change. We can do a breaking change, but it seems to be a bit silly for a very minor change. With the "type tagging" of the input data, we could have fit(::Type{MDS}, X::AbstractMatrix)
still support the distances
kwarg but give a deprecation warning, and then we can remove that API whenever other more substantive changes force a breaking release.