Refactor sklearn models for experimental #415

edwardchalstrey1 · 2025-04-22T14:29:25Z

Based off Add refactored LGBM model to experimental emulators #399 - merge this first
Closes Refactor: Create SklearnBackend class inheriting from Emulator base class & add first example (svm) #405

Questions:

What are the full set of sklearn models that we want to keep?
- Just SVM and Random Forest for now
Do we actually need the SklearnBackend?
If we do keep SklearnBackend, should LightGBM use this?
- Probably not, if it is sufficiently similar then maybe - have a look at the docs. If we did, should call SklearnBackend something different
Why are the checks in each model fit slightly different and do they need to be? If they can be standardised, we can move them into the fit method of SklearnBackend - if not, perhaps keep them in the specific model's fit func
Add type hints to model args

…ove unnecessary newline in SupportVectorMachines

github-actions · 2025-04-22T14:40:41Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
autoemulate/emulators
gaussian_process.py
autoemulate/experimental/data
utils.py					176-178, 184
autoemulate/experimental/emulators
__init__.py
base.py					77-81, 193-195, 208-209, 217
lightgbm.py
random_forest.py
svm.py
autoemulate/experimental/emulators/gaussian_process
exact.py
tests
test_compare.py
tests/experimental
test_experimental_base.py
test_experimental_compare.py
test_experimental_random_forest.py
test_experimental_svm.py
Project Total

_{This report was generated by python-coverage-comment-action}

codecov-commenter · 2025-04-22T14:43:50Z

Codecov Report

Attention: Patch coverage is 93.10345% with 12 lines in your changes missing coverage. Please review.

Project coverage is 80.37%. Comparing base (25d69e7) to head (93c3856).

Files with missing lines	Patch %	Lines
autoemulate/experimental/emulators/base.py	80.48%	8 Missing ⚠️
autoemulate/experimental/data/utils.py	77.77%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #415      +/-   ##
==========================================
+ Coverage   80.09%   80.37%   +0.28%     
==========================================
  Files         100      104       +4     
  Lines        6902     7057     +155     
==========================================
+ Hits         5528     5672     +144     
- Misses       1374     1385      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

autoemulate/experimental/emulators/base.py

sgreenbury · 2025-04-23T14:51:36Z

Adding the code from discussion with @edwardchalstrey1 and @radka-j on possible revision to the base class (relating to #322):

class Emulator(ABC, InputTypeMixin):
    """
    The interface containing methods on emulators that are
    expected by downstream dependents. This includes:
    - `AutoEmulate`
    """

    @abstractmethod
    def __init__(
        self, x: InputLike | None = None, y: InputLike | None = None, **kwargs
    ): ...

    @classmethod
    def model_name(cls) -> str:
        return cls.__name__

    @abstractmethod
    def _fit(self, x: InputLike, y: InputLike | None): ...


    @abstractmethod
    def check(self, x: InputLike, y: InputLike | None): ...

    @abstractmethod
    def convert(self, x: InputLike, y: InputLike | None) -> tuple[InputLike, InputLike | None]: ...

    def fit(self, x, y):
        """
        Fit the model to the data.

        Parameters
        ----------
            x: InputLike
                Input features as numpy array, PyTorch tensor, or DataLoader.
            y: OutputLike or None
                Target values (not needed if x is a DataLoader).

        Returns
        -------
            None
        """
        x, y = self.convert(x, y)
        self.check(x, y)
        self._fit(x, y)

    @abstractmethod
    def predict(self, x: InputLike) -> OutputLike: ...

    @staticmethod
    @abstractmethod
    def get_tune_config() -> TuneConfig: ...

…ved readability"" This reverts commit 8e84a95.

…or improved readability""" This reverts commit 2798ec1.

This reverts commit 1b9aea0.

…ines

_convert_to_numpy Co-authored-by: Sam Greenbury <[email protected]>

fix: correct SupportVectorMachine class name in tests and other changes

…numpy conversion

sgreenbury · 2025-05-09T13:42:07Z

autoemulate/experimental/data/utils.py

+    @staticmethod
+    def _normalize(x: TensorLike) -> tuple[TensorLike, TensorLike, TensorLike]:
+        x_mean = x.mean(0, keepdim=True)
+        x_std = x.std(0, keepdim=True)
+        return (x - x_mean) / x_std, x_mean, x_std
+
+    @staticmethod
+    def _denormalize(
+        x: TensorLike, x_mean: TensorLike, x_std: TensorLike
+    ) -> TensorLike:
+        return (x * x_std) + x_mean


Added this here as initial functionality form preprocessing outputs. This would be good to revisit in #348 where this might be better as distinct functionality that can be combined with the model and/or updated with the API used there.

Also relates to #437

Note @sgreenbury I have added tests for these in #464

…onsistency

…learnBackend

sgreenbury

As discussed, this looks good to merge with subsequent changes with checks/validation to follow in #422, thanks @edwardchalstrey1!

edwardchalstrey1 added 15 commits April 17, 2025 13:12

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

d66e8f2

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

3eb90bd

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

b30f4e5

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

509513b

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

6d96ed4

basic scaffolding for SklearnBackend class

ceb1123

add space

a7b3e0e

add tests

63ed72a

add svm class based on Emulator

377cdbf

update svm to use SklearnBackend base class

374dd3c

fix get_tune_config

3199e37

refactor SklearnBackend __init__ method for improved readability; rem…

f5c416f

…ove unnecessary newline in SupportVectorMachines

ruff

95d9d63

remove pointless line

2db6509

add error handling for non-None NumPy array input in fit method

9f6b7a8

Merge branch '351-refactor-lgbm' into 405-refactor-sklearn

8e6ba5a

sgreenbury changed the base branch from main to 351-refactor-lgbm April 23, 2025 10:16

sgreenbury reviewed Apr 23, 2025

View reviewed changes

autoemulate/experimental/emulators/base.py Outdated Show resolved Hide resolved

sgreenbury reviewed Apr 23, 2025

View reviewed changes

autoemulate/experimental/emulators/base.py Show resolved Hide resolved

sgreenbury mentioned this pull request Apr 23, 2025

Revise base class following MVP #420

Closed

edwardchalstrey1 added 7 commits April 23, 2025 16:02

draft commit (REVERT ME)

1b9aea0

Revert "Revert "Refactor LightGBM test setup to use fixture for impro…

2798ec1

…ved readability"" This reverts commit 8e84a95.

Revert "Revert "Revert "Refactor LightGBM test setup to use fixture f…

e569eeb

…or improved readability""" This reverts commit 2798ec1.

Revert "draft commit (REVERT ME)"

6244d27

This reverts commit 1b9aea0.

refactor to create a common fit func for sklearn

b6da7b6

add back fit method

e35f11d

create check_and_convert func and add InputTypeMixin to Emulator

0cb9d16

edwardchalstrey1 added 16 commits May 9, 2025 11:17

add check_X_y import to SklearnBackend for input validation

5bba24f

refactor: rename fit method to _fit

d46adf6

move normalize fun to new utils

bd402b5

mobe sklearn checks to backend

001ccdc

refactor: move normalization methods to SklearnBackend

40aeedb

refactor: remove commented-out _predict method from SupportVectorMach…

ac90366

…ines

move dim checks into

349a31f

_convert_to_numpy Co-authored-by: Sam Greenbury <[email protected]>

refactor: remove unused reshape parameter from _convert_to_numpy method

fdf9541

refactor: format _denormalize method parameters for improved readability

1748719

remove unused imports

4fe4848

remove prints

3283778

feat: add RandomForest emulator and update ALL_EMULATORS list

719cdfb

fix: correct SupportVectorMachine class name in tests and other changes

feat: add tests for converting to numpy func

25eeefc

add failing test for y_pred.shape on test_predict_rf_2d

04c6020

fix: reshape y_pred only if it is 1-dimensional in SklearnBackend

6f3c53d

style: format code for better readability in SVM and update test for …

d43f305

…numpy conversion

edwardchalstrey1 requested a review from sgreenbury May 9, 2025 13:32

edwardchalstrey1 marked this pull request as ready for review May 9, 2025 13:32

edwardchalstrey1 added 2 commits May 9, 2025 14:36

refactor: remove debug print statement from test_predict_svm

c0e0374

fix: update _convert_to_numpy to name vars accoirding to their type

71957cd

sgreenbury reviewed May 9, 2025

View reviewed changes

edwardchalstrey1 changed the title ~~Refactor sklearn models~~ Refactor sklearn models for experimental May 9, 2025

edwardchalstrey1 added 4 commits May 9, 2025 14:46

style: update parameter annotations in RandomForest constructor for c…

005c661

…onsistency

fix: remove redundant type ignore comment for Tensor conversion in Sk…

cc2f13c

…learnBackend

spelling

3b27b90

move get_tune_config to emulator

93c3856

sgreenbury approved these changes May 9, 2025

View reviewed changes

edwardchalstrey1 merged commit 8b50621 into main May 9, 2025
4 checks passed

edwardchalstrey1 deleted the 405-refactor-sklearn branch May 9, 2025 14:06

sgreenbury mentioned this pull request May 14, 2025

Add normalization to Gaussian Process #437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor sklearn models for experimental #415

Refactor sklearn models for experimental #415

Uh oh!

edwardchalstrey1 commented Apr 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 22, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sgreenbury commented Apr 23, 2025

Uh oh!

sgreenbury May 9, 2025

Uh oh!

sgreenbury May 14, 2025

Uh oh!

edwardchalstrey1 May 14, 2025

Uh oh!

sgreenbury left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor sklearn models for experimental #415

Refactor sklearn models for experimental #415

Uh oh!

Conversation

edwardchalstrey1 commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

codecov-commenter commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

sgreenbury commented Apr 23, 2025

Uh oh!

sgreenbury May 9, 2025

Choose a reason for hiding this comment

Uh oh!

sgreenbury May 14, 2025

Choose a reason for hiding this comment

Uh oh!

edwardchalstrey1 May 14, 2025

Choose a reason for hiding this comment

Uh oh!

sgreenbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edwardchalstrey1 commented Apr 22, 2025 •

edited

Loading

github-actions bot commented Apr 22, 2025 •

edited

Loading

codecov-commenter commented Apr 22, 2025 •

edited

Loading