Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create CoreSubset and KernelHerding classes #210

Merged
merged 72 commits into from
Jan 3, 2024

Conversation

tp832944
Copy link
Contributor

@tp832944 tp832944 commented Oct 26, 2023

PR Type

  • Refactoring (no functional changes)
  • Documentation content changes

Description

Create Coreset ABC and KernelHerding classes as part of transitioning the codebase to OOP.

How Has This Been Tested?

Test A: TODO
Test B: (Write your answer here.)
Test C: (Write your answer here.)

Does this PR introduce a breaking change?

Yes, kernel_herding.py moved to coreset.py and switched from functional to OOP.

Checklist before requesting a review

  • I have made sure that my PR is not a duplicate.
  • My code follows the style guidelines of this project.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have performed a self-review of my code.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

@tp832944 tp832944 linked an issue Oct 26, 2023 that may be closed by this pull request
@tp832944 tp832944 self-assigned this Oct 26, 2023
@bk958178 bk958178 assigned bk958178 and unassigned tp832944 Dec 19, 2023
@tp832944 tp832944 changed the title Create Coreset and KernelHerding classes Create CoreSubset and KernelHerding classes Dec 20, 2023
@pc532627 pc532627 requested a review from bk958178 December 21, 2023 16:57
@pc532627
Copy link
Contributor

@tp832944 @bk958178 - Updated to match (as reasonably as possible) the expected OOP interface. Needs merging into the OOP and updating inline with that.

@pc532627
Copy link
Contributor

Just a note that when we are confident this functions with the rest of the codebase, we need to delete the legacy .py scripts such as kernel_herding.py and kernel_herding_refine.py and any associated tests.

Copy link
Contributor Author

@tp832944 tp832944 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pc532627 I have merged oop_106, resolved conflicts, cleared out several files and reconciled changes so that I can see what this branch entails. If I have deleted one of your changes, please revert it.

I have reviewed coresubset.py with some detailed comments primarily to harmonise the interface. Functionally it looks fine. However, I haven't had a chance to review the tests yet.

coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
coreax/coresubset.py Show resolved Hide resolved
coreax/coresubset.py Outdated Show resolved Hide resolved
@pc532627 pc532627 self-requested a review January 2, 2024 13:16
@tp832944
Copy link
Contributor Author

tp832944 commented Jan 2, 2024

@pc532627 I've been through my previous review and highlighted a couple of points that aren't quite right yet. I have yet to review the tests.

@pc532627 pc532627 removed the request for review from bk958178 January 2, 2024 15:11
@pc532627
Copy link
Contributor

pc532627 commented Jan 2, 2024

@pc532627 I've been through my previous review and highlighted a couple of points that aren't quite right yet. I have yet to review the tests.

@tp832944 I've pushed some updates that hopefully address all of the outstanding comments.

@tp832944
Copy link
Contributor Author

tp832944 commented Jan 2, 2024

@pc532627 I've been through my previous review and highlighted a couple of points that aren't quite right yet. I have yet to review the tests.

@tp832944 I've pushed some updates that hopefully address all of the outstanding comments.

@pc532627 All done. I'll get onto reviewing the tests later.

Copy link
Contributor Author

@tp832944 tp832944 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pc532627 I have now completed the full review, including the tests and functionality of kernel herding. Your tests are well documented. My remaining corrections are mostly stylistic.

:returns: Updated loop variables ``current_coreset_indices`` and
``current_kernel_similarity_penalty``
"""
# Unpack the components of the updatable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updatable is not a noun. Change to loop variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

penalty_update = kernel_vectorised(
x, jnp.atleast_2d(x[index_to_include_in_coreset])
)[:, 0]
current_kernel_similarity_penalty = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use += operator. Easier to read. (Sorry in advance if this isn't valid syntax for Jax Arrays.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick check, I think this syntax is fine and have updated it.

``current_kernel_similarity_penalty``
"""
# Unpack the components of the updatable
(current_coreset_indices, current_kernel_similarity_penalty) = val
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary brackets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

self.coreset_size = 20

def test_fit_comparison_to_random_and_refined(self) -> None:
r"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the r on most of these docstrings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Test the fit method of the KernelHerding class with a simple example.

The test checks that a coreset generated via kernel herding has an improved
quality (measured my maximum mean discrepancy) that one generated by random
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my -> by
that -> than

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

from coreax.coresubset import KernelHerding, RandomSample
from coreax.metrics import MMD
from coreax.reduction import SizeReduce
from tests.unit.test_data import DataReaderConcrete
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this import. See related comment below for better alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

import coreax.coresubset as cs
import coreax.kernel as ck
from coreax.util import jit_test
from tests.unit.test_data import DataReaderConcrete
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete. See comment below for justification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


# Create a kernel herding object
herding_object = cs.KernelHerding(
original_data=data,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed to align with new interface

original_data=data,
weights_optimiser=None,
kernel=kernel,
coreset_size=self.coreset_size,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed to align with new interface

)
)
kernel = ck.SquaredExponentialKernel()
data = DataReaderConcrete(original_data=x, pre_coreset_array=x)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think data is required anymore after adjustments below to match updated interface. The method being tested effectively reads from x directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - thanks for spotting, I've removed that and the now redundant import.

Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting a very minor change, @pc532627, but otherwise it looks good to me

weights_optimiser=None,
kernel=kernel,
)
data_reduction_object_herding = KernelHerding(kernel=kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming to coresubset_object_herding or similar since DataReduction is outdated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

data_reduction_object_herding = KernelHerding(
weights_optimiser=None, kernel=kernel
)
data_reduction_object_herding = KernelHerding(kernel=kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming - see above comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

data_reduction_object_herding = KernelHerding(
weights_optimiser=None, kernel=kernel
)
data_reduction_object_herding = KernelHerding(kernel=kernel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming - see above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@pc532627 pc532627 merged commit 7af3297 into feature/oop_106 Jan 3, 2024
1 of 4 checks passed
@pc532627 pc532627 deleted the feature/coreset_oop branch January 3, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create CoreSubset and KernelHerding classes
5 participants