Skip to content

Conversation

@edwardchalstrey1
Copy link
Member

@edwardchalstrey1 edwardchalstrey1 commented Apr 22, 2025

Questions:

  • What are the full set of sklearn models that we want to keep?
    • Just SVM and Random Forest for now
  • Do we actually need the SklearnBackend?
  • If we do keep SklearnBackend, should LightGBM use this?
    • Probably not, if it is sufficiently similar then maybe - have a look at the docs. If we did, should call SklearnBackend something different
  • Why are the checks in each model fit slightly different and do they need to be? If they can be standardised, we can move them into the fit method of SklearnBackend - if not, perhaps keep them in the specific model's fit func
  • Add type hints to model args

@github-actions
Copy link
Contributor

github-actions bot commented Apr 22, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  autoemulate/emulators
  gaussian_process.py
  autoemulate/experimental/data
  utils.py 176-178, 184
  autoemulate/experimental/emulators
  __init__.py
  base.py 77-81, 193-195, 208-209, 217
  lightgbm.py
  random_forest.py
  svm.py
  autoemulate/experimental/emulators/gaussian_process
  exact.py
  tests
  test_compare.py
  tests/experimental
  test_experimental_base.py
  test_experimental_compare.py
  test_experimental_random_forest.py
  test_experimental_svm.py
Project Total  

This report was generated by python-coverage-comment-action

@codecov-commenter
Copy link

codecov-commenter commented Apr 22, 2025

Codecov Report

Attention: Patch coverage is 93.10345% with 12 lines in your changes missing coverage. Please review.

Project coverage is 80.37%. Comparing base (25d69e7) to head (93c3856).

Files with missing lines Patch % Lines
autoemulate/experimental/emulators/base.py 80.48% 8 Missing ⚠️
autoemulate/experimental/data/utils.py 77.77% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #415      +/-   ##
==========================================
+ Coverage   80.09%   80.37%   +0.28%     
==========================================
  Files         100      104       +4     
  Lines        6902     7057     +155     
==========================================
+ Hits         5528     5672     +144     
- Misses       1374     1385      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgreenbury sgreenbury changed the base branch from main to 351-refactor-lgbm April 23, 2025 10:16
@sgreenbury
Copy link
Collaborator

Adding the code from discussion with @edwardchalstrey1 and @radka-j on possible revision to the base class (relating to #322):

class Emulator(ABC, InputTypeMixin):
    """
    The interface containing methods on emulators that are
    expected by downstream dependents. This includes:
    - `AutoEmulate`
    """

    @abstractmethod
    def __init__(
        self, x: InputLike | None = None, y: InputLike | None = None, **kwargs
    ): ...

    @classmethod
    def model_name(cls) -> str:
        return cls.__name__

    @abstractmethod
    def _fit(self, x: InputLike, y: InputLike | None): ...


    @abstractmethod
    def check(self, x: InputLike, y: InputLike | None): ...

    @abstractmethod
    def convert(self, x: InputLike, y: InputLike | None) -> tuple[InputLike, InputLike | None]: ...

    def fit(self, x, y):
        """
        Fit the model to the data.

        Parameters
        ----------
            x: InputLike
                Input features as numpy array, PyTorch tensor, or DataLoader.
            y: OutputLike or None
                Target values (not needed if x is a DataLoader).

        Returns
        -------
            None
        """
        x, y = self.convert(x, y)
        self.check(x, y)
        self._fit(x, y)

    @abstractmethod
    def predict(self, x: InputLike) -> OutputLike: ...

    @staticmethod
    @abstractmethod
    def get_tune_config() -> TuneConfig: ...

@edwardchalstrey1 edwardchalstrey1 marked this pull request as ready for review May 9, 2025 13:32
Comment on lines +174 to +184
@staticmethod
def _normalize(x: TensorLike) -> tuple[TensorLike, TensorLike, TensorLike]:
x_mean = x.mean(0, keepdim=True)
x_std = x.std(0, keepdim=True)
return (x - x_mean) / x_std, x_mean, x_std

@staticmethod
def _denormalize(
x: TensorLike, x_mean: TensorLike, x_std: TensorLike
) -> TensorLike:
return (x * x_std) + x_mean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this here as initial functionality form preprocessing outputs. This would be good to revisit in #348 where this might be better as distinct functionality that can be combined with the model and/or updated with the API used there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also relates to #437

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note @sgreenbury I have added tests for these in #464

@edwardchalstrey1 edwardchalstrey1 changed the title Refactor sklearn models Refactor sklearn models for experimental May 9, 2025
Copy link
Collaborator

@sgreenbury sgreenbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this looks good to merge with subsequent changes with checks/validation to follow in #422, thanks @edwardchalstrey1!

@edwardchalstrey1 edwardchalstrey1 merged commit 8b50621 into main May 9, 2025
4 checks passed
@edwardchalstrey1 edwardchalstrey1 deleted the 405-refactor-sklearn branch May 9, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: Create SklearnBackend class inheriting from Emulator base class & add first example (svm)

4 participants