-
Notifications
You must be signed in to change notification settings - Fork 27
Add refactored LGBM model to experimental emulators #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
2379ee2
copy existing lightgm class
edwardchalstrey1 1ace801
add base classes and type hints
edwardchalstrey1 ed9a3eb
convert to tensors in Tuner
edwardchalstrey1 211f836
add lightgbm get_tune_config staticmethod
edwardchalstrey1 619b554
add test
edwardchalstrey1 6e63257
import InputLike and OutputLike types in LightGBM emulator
edwardchalstrey1 20549f9
Merge branch '349-refactor-gp' into 351-refactor-lgbm
edwardchalstrey1 81ccb76
fix func name
edwardchalstrey1 40eff87
use numpy not tensor
edwardchalstrey1 5e5da2e
separate tunder logic for lightgbm
edwardchalstrey1 68053ed
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 77feda8
add _convert_to_numpy function
edwardchalstrey1 97740f2
spelling mistake
edwardchalstrey1 d77e5a0
change input to numpy array and output to tensor
edwardchalstrey1 43f4871
remove lightgbm specific code from tuner
edwardchalstrey1 ae75b39
use np instead of scipy for get_tune_config
edwardchalstrey1 3cd7da6
Merge branch '349-refactor-gp' into 351-refactor-lgbm
edwardchalstrey1 ce5a0e2
remove torch import
edwardchalstrey1 45c7080
Ensure the output of predict is a 2D tensor array with shape (n_sampl…
edwardchalstrey1 be4f285
remove commented out param
edwardchalstrey1 4de6c9e
remove unused LightGBM import from tuner.py
edwardchalstrey1 3dfe87a
change example data to be tensor to show numpy conversion worked
edwardchalstrey1 d7ac488
add test_predict_lightgbm
edwardchalstrey1 b47f62c
Ensure y is 1-dimensional in _convert_to_numpy method
edwardchalstrey1 5f58648
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 0f8fb51
remove sample data now in conftest.py
edwardchalstrey1 844ca9b
remove whitespace
edwardchalstrey1 6d771e8
update docstring and type hints _convert_to_numpy
edwardchalstrey1 98be151
fix formatting and type annotation
edwardchalstrey1 7cd5d46
undo change introduced by merge
edwardchalstrey1 98038d9
remove whitespace
edwardchalstrey1 8282273
fit method does not need to return self
edwardchalstrey1 519d841
add new line at end of file
edwardchalstrey1 ae435d0
remove unused imports
edwardchalstrey1 ed056f1
update _convert_to_numpy method to allow y to be None
edwardchalstrey1 2a91f08
refactor imports for consistency and clarity
edwardchalstrey1 ca4bff4
add noqa comment to __init__ method to allow too many arguments
edwardchalstrey1 474e7e0
refactor so that lightgbm does not need training data on init
edwardchalstrey1 17f926e
fix: standardize string quotes in model initialization check
edwardchalstrey1 a85ddee
fix: improve readability of note in get_tune_config method
edwardchalstrey1 1778e25
fix: correct order of feature count assignment in fit method
edwardchalstrey1 8a17f2b
fix: add type ignore comment for Tensor reshaping in predict method
edwardchalstrey1 61f65e9
fix: update type check in predict test from OutputLike to TensorLike
edwardchalstrey1 68e9008
update comment
edwardchalstrey1 1d9cf5a
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 ec2f45d
move lightgbm up one dir
edwardchalstrey1 159c106
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 245a33c
remove unused imports from LightGBM class definition
edwardchalstrey1 2c0d715
Revert "remove unused imports from LightGBM class definition"
edwardchalstrey1 77d2703
Add handling for tuple conversion of numpy arrays in _convert_to_nump…
edwardchalstrey1 121ccc9
Fix _convert_to_numpy method to handle optional second input and ensu…
edwardchalstrey1 30157e3
again remove unused imports from LightGBM class definition
edwardchalstrey1 b257955
remove kwargs and sample_weight
edwardchalstrey1 4c9f2a7
Remove multi_output parameter from check_X_y and delete unused _more_…
edwardchalstrey1 caa2fef
Refactor check_X_y call for improved readability (ruff-format)
edwardchalstrey1 134090e
refactor model_name to base class
edwardchalstrey1 83a239f
Refactor LightGBM test setup to use fixture for improved readability
edwardchalstrey1 8e84a95
Revert "Refactor LightGBM test setup to use fixture for improved read…
edwardchalstrey1 dd28231
Remove unnecessary blank lines in LightGBM class
edwardchalstrey1 7f3e73b
Refactor LightGBM initialization to havebut not use x and y arguments…
edwardchalstrey1 5b09b41
Fix formatting of unused arguments in LightGBM initializer
edwardchalstrey1 a90abbf
todo commit
edwardchalstrey1 abe18fd
Update n_jobs parameter in LightGBM initializer to allow None value
edwardchalstrey1 93f6e09
add link to LGM docs in docstring
edwardchalstrey1 a827cad
Ensure y is 1-dimensional after tensor conversion in _convert_to_numpy
edwardchalstrey1 e33385b
update docstring
edwardchalstrey1 ef8b272
Fix indentation in fit method docstring for clarity
edwardchalstrey1 cf2042b
handle y dimensionality check inside lightgbm
edwardchalstrey1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| import numpy as np | ||
| from lightgbm import LGBMRegressor | ||
| from sklearn.utils.validation import check_array, check_is_fitted, check_X_y | ||
| from torch import Tensor | ||
|
|
||
| from autoemulate.experimental.emulators.base import ( | ||
| Emulator, | ||
| InputTypeMixin, | ||
| ) | ||
| from autoemulate.experimental.types import InputLike, OutputLike | ||
|
|
||
|
|
||
| class LightGBM(Emulator, InputTypeMixin): | ||
| """LightGBM Emulator. | ||
|
|
||
| Wraps LightGBM regression from LightGBM. | ||
| See https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html | ||
| for more details. | ||
| """ | ||
|
|
||
| def __init__( # noqa: PLR0913 allow too many arguments since all currently required | ||
| self, | ||
| x: InputLike | None = None, | ||
| y: InputLike | None = None, | ||
| boosting_type: str = "gbdt", | ||
| num_leaves: int = 31, | ||
| max_depth: int = -1, | ||
| learning_rate: float = 0.1, | ||
| n_estimators: int = 100, | ||
| subsample_for_bin: int = 200000, | ||
| objective: str | None = None, | ||
| class_weight: dict | str | None = None, | ||
| min_split_gain: float = 0.0, | ||
| min_child_weight: float = 0.001, | ||
| min_child_samples: int = 20, | ||
| subsample: float = 1.0, | ||
| colsample_bytree: float = 1.0, | ||
| reg_alpha: float = 0.0, | ||
| reg_lambda: float = 0.0, | ||
| random_state: int | None = None, | ||
| n_jobs: int | None = 1, | ||
| importance_type: str = "split", | ||
| verbose: int = -1, | ||
| ): | ||
| """Initializes a LightGBM object.""" | ||
| _, _ = x, y # ignore unused arguments | ||
| self.boosting_type = boosting_type | ||
| self.num_leaves = num_leaves | ||
| self.max_depth = max_depth | ||
| self.learning_rate = learning_rate | ||
| self.n_estimators = n_estimators | ||
| self.subsample_for_bin = subsample_for_bin | ||
| self.objective = objective | ||
| self.class_weight = class_weight | ||
| self.min_split_gain = min_split_gain | ||
| self.min_child_weight = min_child_weight | ||
| self.min_child_samples = min_child_samples | ||
| self.subsample = subsample | ||
| self.colsample_bytree = colsample_bytree | ||
| self.reg_alpha = reg_alpha | ||
| self.reg_lambda = reg_lambda | ||
| self.random_state = random_state | ||
| self.n_jobs = n_jobs | ||
| self.importance_type = importance_type | ||
| self.verbose = verbose | ||
|
|
||
| def fit(self, x: InputLike, y: InputLike | None): | ||
| """ | ||
| Fits the emulator to the data. | ||
| The model expects the input data to be: | ||
| x (features): 2D array | ||
| y (target): 1D array | ||
| """ | ||
|
|
||
| x, y = self._convert_to_numpy(x, y) | ||
|
|
||
| if y is None: | ||
| msg = "y must be provided." | ||
| raise ValueError(msg) | ||
| if y.ndim > 2: | ||
| msg = f"y must be 1D or 2D array. Found {y.ndim}D array." | ||
| raise ValueError(msg) | ||
| if y.ndim == 2: # _convert_to_numpy may return 2D y | ||
| y = y.ravel() # Ensure y is 1-dimensional | ||
|
|
||
| self.n_features_in_ = x.shape[1] | ||
|
|
||
| x, y = check_X_y(x, y, y_numeric=True) | ||
|
|
||
| self.model_ = LGBMRegressor( | ||
| boosting_type=self.boosting_type, | ||
| num_leaves=self.num_leaves, | ||
| max_depth=self.max_depth, | ||
| learning_rate=self.learning_rate, | ||
| n_estimators=self.n_estimators, | ||
| subsample_for_bin=self.subsample_for_bin, | ||
| objective=self.objective, | ||
| class_weight=self.class_weight, | ||
| min_split_gain=self.min_split_gain, | ||
| min_child_weight=self.min_child_weight, | ||
| min_child_samples=self.min_child_samples, | ||
| subsample=self.subsample, | ||
| colsample_bytree=self.colsample_bytree, | ||
| reg_alpha=self.reg_alpha, | ||
| reg_lambda=self.reg_lambda, | ||
| random_state=self.random_state, | ||
| n_jobs=self.n_jobs, | ||
| importance_type=self.importance_type, | ||
| verbose=self.verbose, | ||
| ) | ||
|
|
||
| self.model_.fit(x, y) | ||
| self.is_fitted_ = True | ||
|
|
||
| def predict(self, x: InputLike) -> OutputLike: | ||
| """Predicts the output of the emulator for a given input.""" | ||
| x = check_array(x) | ||
edwardchalstrey1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| check_is_fitted(self, "is_fitted_") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If using this check function is dependent on this object inheriting the the sklearn base objects I'd be in favour of not doing the inheritance and just getting rid of this (and replacing it with our own check if we think that's necessary) |
||
| y_pred = self.model_.predict(x) | ||
| # Ensure the output is a 2D tensor array with shape (n_samples, 1) | ||
| return Tensor(y_pred.reshape(-1, 1)) # type: ignore PGH003 | ||
|
|
||
| @staticmethod | ||
| def get_tune_config(): | ||
| # Note: 10 ** np.random.uniform(-3, 0) | ||
| # is equivalent to scipy.stats.loguniform(0.001, 0.1) | ||
| return { | ||
| "num_leaves": [np.random.randint(10, 100)], | ||
| "max_depth": [np.random.randint(-1, 12)], | ||
| "learning_rate": [10 ** np.random.uniform(-3, -1)], | ||
| "n_estimators": [np.random.randint(50, 1000)], | ||
| "reg_alpha": [10 ** np.random.uniform(-3, 0)], | ||
| "reg_lambda": [10 ** np.random.uniform(-3, 0)], | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| from autoemulate.experimental.emulators.lightgbm import ( | ||
| LightGBM, | ||
| ) | ||
| from autoemulate.experimental.tuner import Tuner | ||
| from autoemulate.experimental.types import TensorLike | ||
|
|
||
|
|
||
| def test_predict_lightgbm(sample_data_y1d, new_data_y1d): | ||
| x, y = sample_data_y1d | ||
| lgbm = LightGBM() | ||
| lgbm.fit(x, y) | ||
| x2, _ = new_data_y1d | ||
| y_pred = lgbm.predict(x2) | ||
| assert isinstance(y_pred, TensorLike) | ||
|
|
||
|
|
||
| def test_tune_lightgbm(sample_data_y1d): | ||
| x, y = sample_data_y1d | ||
| tuner = Tuner(x, y, n_iter=5) | ||
| scores, configs = tuner.run(LightGBM) | ||
| assert len(scores) == 5 | ||
| assert len(configs) == 5 | ||
|
|
||
|
|
||
| def test_lightgm_class_name_returned(): | ||
| lgbm = LightGBM() | ||
| assert lgbm.model_name() == "LightGBM" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.