-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Thanks for the great presentation today, @benfulcher!
Inspired, I looked in greater detail into the repository, to my shame perhaps for the first time at that level of detail.
What I understood is that your SPI are actually not all manually implemented, but there is a wealth of them, some using external dependencies in turn. As such, pyspi is, morally, very much similar to sktime, being a mix of de-novo implementations, direct interfaces to external algorithms, and implementations that use components with soft dependencies.
I also noticed that you have tags for the different SPI, which again is very similar to sktime.
Further, when trying to interface SPI individually, I noticed that this is currently not intended to be possible - only batch feature sets can be obtained? Which seems to be a shame, you have collected so many useful pairwise transformations! Unless of course you use the yaml, and the process of discovery if you want that is tedious, and currently cannot be automated, so composability with other frameworks is severely limited.
Based on this, I had a number of ideas if you would like to hear me out:
- move the SPIs to a strategy object orientation pattern, using the tag system provided by
scikit-base. This would give you for free runtime discoverability - no need to use config yaml, or the webpage.- at runtime, it would look like
all_estimatorsinsktime: https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.registry.all_estimators.html - on the webpage, you can use this as a backend for selection with tags: https://www.sktime.net/en/latest/estimator_overview.html
- at runtime, it would look like
- change SPIs from private API to public API. Define a public API, and add a test suite for testing individual SPI for interface conformance.
- isolate SPI specific dependencies to the particular SPI. Then you can treat, I believe, all dependencies as soft dependencies.
- users could then say: compute all SPI for which I have all dependencies installed. Or, get me all SPI for this dependency set. SPIs would also tell the user directly which dependency to install.
pyprojectwould look like this insktime: https://github.com/sktime/sktime/blob/main/pyproject.toml - minimal core dependency set; and dependencies are managed via tags like this: https://www.sktime.net/en/latest/api_reference/tags.html#general-tags-packaging
What do you think? I'd be happy to devote some time to shift the code base gradually towards this schema. As a side effect, it would also easily allow to interface all SPI as time seires distances in sktime, and would make it easier to add SPI for multivariate or unequal length time series.
FYI @jmoo2880