Skip to content
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
2379ee2
copy existing lightgm class
edwardchalstrey1 Apr 10, 2025
1ace801
add base classes and type hints
edwardchalstrey1 Apr 10, 2025
ed9a3eb
convert to tensors in Tuner
edwardchalstrey1 Apr 11, 2025
211f836
add lightgbm get_tune_config staticmethod
edwardchalstrey1 Apr 11, 2025
619b554
add test
edwardchalstrey1 Apr 11, 2025
6e63257
import InputLike and OutputLike types in LightGBM emulator
edwardchalstrey1 Apr 11, 2025
20549f9
Merge branch '349-refactor-gp' into 351-refactor-lgbm
edwardchalstrey1 Apr 11, 2025
81ccb76
fix func name
edwardchalstrey1 Apr 11, 2025
40eff87
use numpy not tensor
edwardchalstrey1 Apr 11, 2025
5e5da2e
separate tunder logic for lightgbm
edwardchalstrey1 Apr 11, 2025
68053ed
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 Apr 11, 2025
77feda8
add _convert_to_numpy function
edwardchalstrey1 Apr 14, 2025
97740f2
spelling mistake
edwardchalstrey1 Apr 14, 2025
d77e5a0
change input to numpy array and output to tensor
edwardchalstrey1 Apr 14, 2025
43f4871
remove lightgbm specific code from tuner
edwardchalstrey1 Apr 14, 2025
ae75b39
use np instead of scipy for get_tune_config
edwardchalstrey1 Apr 14, 2025
3cd7da6
Merge branch '349-refactor-gp' into 351-refactor-lgbm
edwardchalstrey1 Apr 14, 2025
ce5a0e2
remove torch import
edwardchalstrey1 Apr 14, 2025
45c7080
Ensure the output of predict is a 2D tensor array with shape (n_sampl…
edwardchalstrey1 Apr 14, 2025
be4f285
remove commented out param
edwardchalstrey1 Apr 15, 2025
4de6c9e
remove unused LightGBM import from tuner.py
edwardchalstrey1 Apr 15, 2025
3dfe87a
change example data to be tensor to show numpy conversion worked
edwardchalstrey1 Apr 15, 2025
d7ac488
add test_predict_lightgbm
edwardchalstrey1 Apr 15, 2025
b47f62c
Ensure y is 1-dimensional in _convert_to_numpy method
edwardchalstrey1 Apr 15, 2025
5f58648
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 Apr 16, 2025
0f8fb51
remove sample data now in conftest.py
edwardchalstrey1 Apr 16, 2025
844ca9b
remove whitespace
edwardchalstrey1 Apr 16, 2025
6d771e8
update docstring and type hints _convert_to_numpy
edwardchalstrey1 Apr 16, 2025
98be151
fix formatting and type annotation
edwardchalstrey1 Apr 16, 2025
7cd5d46
undo change introduced by merge
edwardchalstrey1 Apr 16, 2025
98038d9
remove whitespace
edwardchalstrey1 Apr 16, 2025
8282273
fit method does not need to return self
edwardchalstrey1 Apr 16, 2025
519d841
add new line at end of file
edwardchalstrey1 Apr 16, 2025
ae435d0
remove unused imports
edwardchalstrey1 Apr 16, 2025
ed056f1
update _convert_to_numpy method to allow y to be None
edwardchalstrey1 Apr 16, 2025
2a91f08
refactor imports for consistency and clarity
edwardchalstrey1 Apr 16, 2025
ca4bff4
add noqa comment to __init__ method to allow too many arguments
edwardchalstrey1 Apr 16, 2025
474e7e0
refactor so that lightgbm does not need training data on init
edwardchalstrey1 Apr 16, 2025
17f926e
fix: standardize string quotes in model initialization check
edwardchalstrey1 Apr 16, 2025
a85ddee
fix: improve readability of note in get_tune_config method
edwardchalstrey1 Apr 16, 2025
1778e25
fix: correct order of feature count assignment in fit method
edwardchalstrey1 Apr 16, 2025
8a17f2b
fix: add type ignore comment for Tensor reshaping in predict method
edwardchalstrey1 Apr 16, 2025
61f65e9
fix: update type check in predict test from OutputLike to TensorLike
edwardchalstrey1 Apr 16, 2025
68e9008
update comment
edwardchalstrey1 Apr 16, 2025
1d9cf5a
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 Apr 17, 2025
ec2f45d
move lightgbm up one dir
edwardchalstrey1 Apr 17, 2025
159c106
Merge branch 'main' into 351-refactor-lgbm
edwardchalstrey1 Apr 17, 2025
245a33c
remove unused imports from LightGBM class definition
edwardchalstrey1 Apr 17, 2025
2c0d715
Revert "remove unused imports from LightGBM class definition"
edwardchalstrey1 Apr 17, 2025
77d2703
Add handling for tuple conversion of numpy arrays in _convert_to_nump…
edwardchalstrey1 Apr 17, 2025
121ccc9
Fix _convert_to_numpy method to handle optional second input and ensu…
edwardchalstrey1 Apr 17, 2025
30157e3
again remove unused imports from LightGBM class definition
edwardchalstrey1 Apr 17, 2025
b257955
remove kwargs and sample_weight
edwardchalstrey1 Apr 17, 2025
4c9f2a7
Remove multi_output parameter from check_X_y and delete unused _more_…
edwardchalstrey1 Apr 17, 2025
caa2fef
Refactor check_X_y call for improved readability (ruff-format)
edwardchalstrey1 Apr 17, 2025
134090e
refactor model_name to base class
edwardchalstrey1 Apr 23, 2025
83a239f
Refactor LightGBM test setup to use fixture for improved readability
edwardchalstrey1 Apr 23, 2025
8e84a95
Revert "Refactor LightGBM test setup to use fixture for improved read…
edwardchalstrey1 Apr 23, 2025
dd28231
Remove unnecessary blank lines in LightGBM class
edwardchalstrey1 Apr 23, 2025
7f3e73b
Refactor LightGBM initialization to havebut not use x and y arguments…
edwardchalstrey1 Apr 23, 2025
5b09b41
Fix formatting of unused arguments in LightGBM initializer
edwardchalstrey1 Apr 23, 2025
a90abbf
todo commit
edwardchalstrey1 Apr 23, 2025
abe18fd
Update n_jobs parameter in LightGBM initializer to allow None value
edwardchalstrey1 Apr 24, 2025
93f6e09
add link to LGM docs in docstring
edwardchalstrey1 Apr 24, 2025
a827cad
Ensure y is 1-dimensional after tensor conversion in _convert_to_numpy
edwardchalstrey1 Apr 24, 2025
e33385b
update docstring
edwardchalstrey1 Apr 24, 2025
ef8b272
Fix indentation in fit method docstring for clarity
edwardchalstrey1 Apr 24, 2025
cf2042b
handle y dimensionality check inside lightgbm
edwardchalstrey1 Apr 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions autoemulate/experimental/data/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,25 @@ def _convert_to_tensors(
f"Unsupported type for dataset ({type(dataset)}). Must be TensorDataset."
)

def _convert_to_numpy(
self,
x: InputLike,
y: InputLike | None = None,
) -> tuple[np.ndarray, np.ndarray | None]:
"""
Convert InputLike x, y to tuple of numpy arrays.
"""
if isinstance(x, np.ndarray) and (y is None or isinstance(y, np.ndarray)):
return x, y

result = self._convert_to_tensors(x, y)
if isinstance(result, tuple):
x, y = result
y = y.ravel() # Ensure y is 1-dimensional
return x.numpy(), y.numpy()
x = result
return x.numpy(), None

def _random_split(
self,
dataset: Dataset,
Expand Down
119 changes: 119 additions & 0 deletions autoemulate/experimental/emulators/lightgbm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import numpy as np
from lightgbm import LGBMRegressor
from sklearn.utils.validation import check_array, check_is_fitted, check_X_y
from torch import Tensor

from autoemulate.experimental.emulators.base import (
Emulator,
InputTypeMixin,
)
from autoemulate.experimental.types import InputLike, OutputLike


class LightGBM(Emulator, InputTypeMixin):
"""LightGBM Emulator.
Wraps LightGBM regression from LightGBM.
"""

def __init__( # noqa: PLR0913 allow too many arguments since all currently required
self,
boosting_type="gbdt",
num_leaves=31,
max_depth=-1,
learning_rate=0.1,
n_estimators=100,
subsample_for_bin=200000,
objective=None,
class_weight=None,
min_split_gain=0.0,
min_child_weight=0.001,
min_child_samples=20,
subsample=1.0,
colsample_bytree=1.0,
reg_alpha=0.0,
reg_lambda=0.0,
random_state=None,
n_jobs=1,
importance_type="split",
verbose=-1,
):
"""Initializes a LightGBM object."""
self.boosting_type = boosting_type
self.num_leaves = num_leaves
self.max_depth = max_depth
self.learning_rate = learning_rate
self.n_estimators = n_estimators
self.subsample_for_bin = subsample_for_bin
self.objective = objective
self.class_weight = class_weight
self.min_split_gain = min_split_gain
self.min_child_weight = min_child_weight
self.min_child_samples = min_child_samples
self.subsample = subsample
self.colsample_bytree = colsample_bytree
self.reg_alpha = reg_alpha
self.reg_lambda = reg_lambda
self.random_state = random_state
self.n_jobs = n_jobs
self.importance_type = importance_type
self.verbose = verbose

def fit(self, x: InputLike, y: InputLike | None):
"""Fits the emulator to the data."""

x, y = self._convert_to_numpy(x, y)

self.n_features_in_ = x.shape[1]

x, y = check_X_y(x, y, y_numeric=True)

self.model_ = LGBMRegressor(
boosting_type=self.boosting_type,
num_leaves=self.num_leaves,
max_depth=self.max_depth,
learning_rate=self.learning_rate,
n_estimators=self.n_estimators,
subsample_for_bin=self.subsample_for_bin,
objective=self.objective,
class_weight=self.class_weight,
min_split_gain=self.min_split_gain,
min_child_weight=self.min_child_weight,
min_child_samples=self.min_child_samples,
subsample=self.subsample,
colsample_bytree=self.colsample_bytree,
reg_alpha=self.reg_alpha,
reg_lambda=self.reg_lambda,
random_state=self.random_state,
n_jobs=self.n_jobs,
importance_type=self.importance_type,
verbose=self.verbose,
)

self.model_.fit(x, y)
self.is_fitted_ = True

def predict(self, x: InputLike) -> OutputLike:
"""Predicts the output of the emulator for a given input."""
x = check_array(x)
check_is_fitted(self, "is_fitted_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using this check function is dependent on this object inheriting the the sklearn base objects I'd be in favour of not doing the inheritance and just getting rid of this (and replacing it with our own check if we think that's necessary)

y_pred = self.model_.predict(x)
# Ensure the output is a 2D tensor array with shape (n_samples, 1)
return Tensor(y_pred.reshape(-1, 1)) # type: ignore PGH003

@staticmethod
def get_tune_config():
# Note: 10 ** np.random.uniform(-3, 0)
# is equivalent to scipy.stats.loguniform(0.001, 0.1)
return {
"num_leaves": [np.random.randint(10, 100)],
"max_depth": [np.random.randint(-1, 12)],
"learning_rate": [10 ** np.random.uniform(-3, -1)],
"n_estimators": [np.random.randint(50, 1000)],
"reg_alpha": [10 ** np.random.uniform(-3, 0)],
"reg_lambda": [10 ** np.random.uniform(-3, 0)],
}

@property
def model_name(self):
return self.__class__.__name__
7 changes: 6 additions & 1 deletion autoemulate/experimental/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,12 @@ def run(self, model_class: type[Emulator]) -> tuple[list[float], list[ModelConfi
}

# TODO: consider whether to pass as tensors or dataloader
m = model_class(train_x, train_y, **model_config)
# TODO: is there a better way to distinguish between models that
# require training data for initialisation as well as fitting?
if "x" in model_class.__init__.__code__.co_varnames:
m = model_class(train_x, train_y, **model_config)
else:
m = model_class(**model_config)
m.fit(train_x, train_y)

# evaluate
Expand Down
22 changes: 22 additions & 0 deletions tests/experimental/test_experimental_lightgbm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from autoemulate.experimental.emulators.lightgbm import (
LightGBM,
)
from autoemulate.experimental.tuner import Tuner
from autoemulate.experimental.types import TensorLike


def test_predict_lightgbm(sample_data_y1d, new_data_y1d):
x, y = sample_data_y1d
lgbm = LightGBM()
lgbm.fit(x, y)
x2, _ = new_data_y1d
y_pred = lgbm.predict(x2)
assert isinstance(y_pred, TensorLike)


def test_tune_lightgbm(sample_data_y1d):
x, y = sample_data_y1d
tuner = Tuner(x, y, n_iter=5)
scores, configs = tuner.run(LightGBM)
assert len(scores) == 5
assert len(configs) == 5
Loading