Skip to content

API design #13

Open
Open
@Fil

Description

@Fil

With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.

It seems to be that it would be nice to rethink the API "à la D3", so that:

  • all the algorithms can be called interchangeably
  • we could separate the training and transform phases (learn then transform? #11)
  • we could specify hyperparameters individually
  • we could serialize the model (in and out : save and load)

I would imagine that this could be structured as:

  • new Druid([method or model]) — create a druid
  • druid.values([accessor]) — sets the values accessor if specified, and returns the druid; return the values accessor if not specified
  • druid.dimensions([number]) — sets or returns the dimensions (default: 2)
  • druid.class([accessor]) — sets or returns the class accessor (for LDA)
  • druid.method([name or class]) — sets the current method (UMAP, FASTMAP etc) if specified and returns the druid ; if not specified, return the method (as a Class or function).
  • druid.fit(data) — train the model on the data and returns the druid
  • druid.transform([data]) — transforms the data if specified; if data is not specified, returns the transformed train set
  • druid.model([model]) — returns the serialized model (JSON) if a model is not specified, loads the model if specified

And for each hyperparameter, for example UMAP/min_dist

  • druid.min_dist([min_dist]) — if specified, sets the min_dist hyperparameter and returns the druid, or read it if not specified

With this we could say for example:

const dr = new Druid("LDA"); // dr
dr.dimensions(2).class(d => d.species).values(d => [+d.sepal_length, +d.petal_length, ]).fit(data); // dr
dr.transform(); // transformed data
const model = dr.model(); // JSON {}

const dr = new Druid(model); // dr
dr.transform([new data]); // apply the model to new data…

I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.

Note also that some methods such as UMAP can accept a distance matrix instead of a data array.

PS: Sorry for spamming your project :) The potential is very exciting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions