Skip to content

MDS: supply Gram matrix directly #232

@timholy

Description

@timholy

Currently the pipeline for MDS is X -> D -> G -> M, where X is a coordinate representation of the data, D is the pairwise distance matrix, G is the Gram matrix, and M is the final MDS model. One can alternatively go D -> G -> M by supplying the distances=true keyword. However, there are applications where the natural thing to supply is G. For example, if you are working with objects for which there is a natural inner product, and you want to visualize these objects in a lower-dimensional space, you're going to first compute G, and it would be quite silly to use fit(MDS, gram2dist(G); distances=true) when the first thing that will happen inside fit is to convert the distances back into the Gram matrix.

I know there isn't a ton of stuff inside fit that happens after you have G, but from the standpoint of keeping things in sync I think it would be best to expose this to the user instead of forcing them to create their own private version of fit.

The only hard problem is deciding the API. I think a good design would be

struct Distances{M<:AbstractMatrix}
    D::M
end

struct Gramian{M<:AbstractMatrix}
    G::M
end

fit(::Type{MDS}, X::AbstractMatrix) = fit(MDS, Distances(L2distance(X)))
fit(::Type{MDS}, D::Distances) = fit(MDS, Gramian(dmat2gram(D.D)))
function fit(::Type{MDS}, G::Gramian)
    # the "main" implementation
end

Then users call one of:

  • fit(MDS, X)
  • fit(MDS, Distances(D))
  • fit(MSD, Gramian(G))

depending on what kind of starting point they have.

One alternative approach is to use a keyword argument, but here we would run into a compatibility problem: if we add gramian::Bool=false as a keyword, then we have to look for the case where distances == gramian == true and throw an error. This is not a big deal but it feels a bit ugly. Alternatively we could add interpretation=:coordinates and support the settings :distances and :gramian but that would be a breaking change. We can do a breaking change, but it seems to be a bit silly for a very minor change. With the "type tagging" of the input data, we could have fit(::Type{MDS}, X::AbstractMatrix) still support the distances kwarg but give a deprecation warning, and then we can remove that API whenever other more substantive changes force a breaking release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions