Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Extended persistence and lower star filtrations #561

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ulupo
Copy link
Collaborator

@ulupo ulupo commented Feb 3, 2021

Reference issues/PRs
#337 #546

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description
Begins adding support for extended persistence via a new class LowerStarFlagPersistence and a new plotting function plot_extended_diagram. Does not yet address downstream processing of extended diagrams. There is also a new data structure for extended persistence diagrams. An extended persistence diagram is a 2D ndarray of shape (n_features, 4) where the first 3 columns are as for ordinary persistence (birth-death-dimension), and the fourth is either 1 or -1: 1 when the feature was born and died during the same "sweep", -1 otherwise. This allows to partition the extended diagram into the usual 4 portions:

  • birth < death and same sweep
  • birth < death and different sweep
  • birth > death and same sweep
  • birth > death and different sweep

The extended persistence diagrams are obtained by "coning".

Numerical stability issues have not yet been addressed.

Checklist

  • I have read the guidelines for contributing.
  • My code follows the code style of this project. I used flake8 to check my Python changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed. I used pytest to check this on Python tests.

@ulupo ulupo requested a review from wreise February 3, 2021 09:45
Copy link
Collaborator

@wreise wreise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Let's think about what comes next !

gtda/homology/_utils.py Outdated Show resolved Hide resolved
gtda/homology/simplicial.py Outdated Show resolved Hide resolved
gtda/plotting/persistence_diagrams.py Show resolved Hide resolved
@ulupo
Copy link
Collaborator Author

ulupo commented Feb 12, 2021

Following @wreise's demonstration that numerical errors can be introduced in the downard sweep if the input has 64-bit float precision (because ripser internally converts to 32-bit precision), I have now added a conversion to float32 at the very beginning of the computation in the extended case.

@ulupo
Copy link
Collaborator Author

ulupo commented Feb 12, 2021

Note: In discussion with @wreise, we noted that in the case of extended persistence one cannot dispose of all zero-persistence pairs. Zero-persistence pairs created in "different sweeps" are essential bars in regular persistence! An example is an isolated point.

gtda/homology/simplicial.py Outdated Show resolved Hide resolved
@wreise
Copy link
Collaborator

wreise commented Feb 16, 2021

I had a pretty thorough look, and imo, it does what it claims to. :) The new additions should be covered by the tests, apart from the plotting method.

@ulupo
Copy link
Collaborator Author

ulupo commented Feb 16, 2021

@wreise thanks for the thorough look! Of course there are still docs to write but more importantly we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.

@ulupo
Copy link
Collaborator Author

ulupo commented Feb 16, 2021

Another point is that the input validation is inadequate still (my fault) because really this transformer should only work with sparse input, and yet we don't throw a useful error if the input is dense.

@wreise
Copy link
Collaborator

wreise commented Feb 16, 2021

(...) we can't really ship this without intervening in some way on the downstream vectorization methods, even if it's just by saying "this is not supported with extended persistence" in some cases.

I agree - the pr is far from ready :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants