-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create CoreSubset and KernelHerding classes #210
Conversation
Methods not populated. refs: #164
…create_data_reduction_abc
Just a note that when we are confident this functions with the rest of the codebase, we need to delete the legacy .py scripts such as kernel_herding.py and kernel_herding_refine.py and any associated tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pc532627 I have merged oop_106, resolved conflicts, cleared out several files and reconciled changes so that I can see what this branch entails. If I have deleted one of your changes, please revert it.
I have reviewed coresubset.py with some detailed comments primarily to harmonise the interface. Functionally it looks fine. However, I haven't had a chance to review the tests yet.
@pc532627 I've been through my previous review and highlighted a couple of points that aren't quite right yet. I have yet to review the tests. |
Refs: #164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pc532627 I have now completed the full review, including the tests and functionality of kernel herding. Your tests are well documented. My remaining corrections are mostly stylistic.
coreax/coresubset.py
Outdated
:returns: Updated loop variables ``current_coreset_indices`` and | ||
``current_kernel_similarity_penalty`` | ||
""" | ||
# Unpack the components of the updatable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updatable
is not a noun. Change to loop variables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
coreax/coresubset.py
Outdated
penalty_update = kernel_vectorised( | ||
x, jnp.atleast_2d(x[index_to_include_in_coreset]) | ||
)[:, 0] | ||
current_kernel_similarity_penalty = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use +=
operator. Easier to read. (Sorry in advance if this isn't valid syntax for Jax Arrays.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick check, I think this syntax is fine and have updated it.
coreax/coresubset.py
Outdated
``current_kernel_similarity_penalty`` | ||
""" | ||
# Unpack the components of the updatable | ||
(current_coreset_indices, current_kernel_similarity_penalty) = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary brackets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
tests/unit/test_coresubset.py
Outdated
self.coreset_size = 20 | ||
|
||
def test_fit_comparison_to_random_and_refined(self) -> None: | ||
r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the r
on most of these docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
tests/unit/test_coresubset.py
Outdated
Test the fit method of the KernelHerding class with a simple example. | ||
|
||
The test checks that a coreset generated via kernel herding has an improved | ||
quality (measured my maximum mean discrepancy) that one generated by random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my
-> by
that
-> than
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tests/unit/test_coresubset.py
Outdated
from coreax.coresubset import KernelHerding, RandomSample | ||
from coreax.metrics import MMD | ||
from coreax.reduction import SizeReduce | ||
from tests.unit.test_data import DataReaderConcrete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this import. See related comment below for better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
tests/performance/test_coresubset.py
Outdated
import coreax.coresubset as cs | ||
import coreax.kernel as ck | ||
from coreax.util import jit_test | ||
from tests.unit.test_data import DataReaderConcrete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete. See comment below for justification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
tests/performance/test_coresubset.py
Outdated
|
||
# Create a kernel herding object | ||
herding_object = cs.KernelHerding( | ||
original_data=data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed to align with new interface
tests/performance/test_coresubset.py
Outdated
original_data=data, | ||
weights_optimiser=None, | ||
kernel=kernel, | ||
coreset_size=self.coreset_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed to align with new interface
tests/performance/test_coresubset.py
Outdated
) | ||
) | ||
kernel = ck.SquaredExponentialKernel() | ||
data = DataReaderConcrete(original_data=x, pre_coreset_array=x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think data
is required anymore after adjustments below to match updated interface. The method being tested effectively reads from x
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - thanks for spotting, I've removed that and the now redundant import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting a very minor change, @pc532627, but otherwise it looks good to me
tests/unit/test_coresubset.py
Outdated
weights_optimiser=None, | ||
kernel=kernel, | ||
) | ||
data_reduction_object_herding = KernelHerding(kernel=kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming to coresubset_object_herding
or similar since DataReduction is outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tests/unit/test_coresubset.py
Outdated
data_reduction_object_herding = KernelHerding( | ||
weights_optimiser=None, kernel=kernel | ||
) | ||
data_reduction_object_herding = KernelHerding(kernel=kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming - see above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tests/unit/test_coresubset.py
Outdated
data_reduction_object_herding = KernelHerding( | ||
weights_optimiser=None, kernel=kernel | ||
) | ||
data_reduction_object_herding = KernelHerding(kernel=kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming - see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
PR Type
Description
Create Coreset ABC and KernelHerding classes as part of transitioning the codebase to OOP.
How Has This Been Tested?
Test A: TODO
Test B: (Write your answer here.)
Test C: (Write your answer here.)
Does this PR introduce a breaking change?
Yes,
kernel_herding.py
moved tocoreset.py
and switched from functional to OOP.Checklist before requesting a review