-
Notifications
You must be signed in to change notification settings - Fork 85
Description
First of all, thank you @ubauer for the amazing work done with ripser
! (Opening an issue gives me a good excuse to finally say this directly!)
It seems to me that the quantum leap in computational runtimes made possible by ripser
is an important piece in the history of making TDA more appealing and accessible to non-topologists. As you know, this is true in particular in the Python community (that I am most familiar with) thanks to the great effort by you, @ctralie, @sauln (and others maybe?) in making solid bindings in ripser.py
. It is now much less esoteric to suggest that persistent-homology--derived feature extraction can be made an integral part of machine learning pipelines. Several projects are now exploring how to provide "regular data scientists" with plug-in topological components which can be used alongside more conventional machine learning toolkits. scikit-tda
, GUDHI
and giotto-tda
(the latter of which I am involved with) are some such examples.
When interacting with non-topologists who are data science practitioners, I find that arguing in favour of TDA based on its ability to describe geometric structures that are in principle "there to see" tends to be successful. It brings the field very close to, say, clustering and other data-viz techniques which everyone would agree are useful!
Persistence diagrams/barcodes are great of course, but they are to the full persistent homology calculation a bit (or a lot, for H_0) as returning a cut dendrogram in single-linkage clustering would be to returning an actual clustering of the data. Representative (persistent) cycles, if visualized, could make (Vietoris-Rips) persistent homology a much more "immediate" concept to grapple with for many, not to mention the actual insight into the data that they would bring.
Sorry for the long spiel which contains little new information to you (mostly there to provide context for other readers), but I hope it helps me segue effectively into the real questions. I noticed that you mention the experimental representative_cycles branch, which is great! I am wondering: What would be your quick assessment on the status of progress there? Is there a hoped-for release date, or is work there still in a research phase? Do you expect that final runtimes will compare favourably with Eirene
?
I doubt I could contribute much to the C++ codebase, but would love to eventually help the Python community integrate these developments.
Thanks for the patience in reading!