Skip to content

Conversation

@giovannimicaroni
Copy link

This PR is an attempt at solving issues #798 and #423. We added new parameters in the _predict function of the base_automl class and adapted the parts of the code that we found necessary. We did not implement direct unit tests since we did not find unit tests that directly test the predict method. If possible, it would be nice to get a direction in that sense, so that this implementation can be merged. Thanks

Copy link
Contributor

@pplonski pplonski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for PR,

X : array-like, pandas.DataFrame
Input data to generate predictions for.
prediction_mode : str, default='best'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR! You assumed that user already has a list with loaded models. Do you have example code for such use?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use for this case would be something like: predictions = automl.predict(X_test, prediction_mode = 'custom', custom_models=["2_DecisionTree", "3_Linear"]). I was not sure how to change the names for the models, since they are objects from the ModelFramework class, and I didn't want to change it. To minimize that I thought about adding the available models names to the documentation and printing an error that shows their names aswell, in case the user enters invalid names.

prediction_mode : str, default='best'
Model selection strategy:
- 'best': selects the top `n_models` models ranked by performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is too much options, can options be set based on list of models?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I think I can leave the default mode as best, so that it doesnt affect previously working code, and leave the option to select all modes as a prediction mode, but change the custom mode to be used based on the list of models the user provides. What do you think?

- 'custom': selects only the models explicitly listed in `custom_models`.
- 'all': uses all trained models.
n_models : int, default=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need n_models? can it be set from list of models?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for n_models was that if a user is using the 'best' mode, but, for example, wants the predictions for the top 3 performing models, n_models could be set to 3, so that the user gets the 3 best predictions. Should I keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep only list of models ad additional argument and use all models provided in the list.

The first case would be:

# use the best model to compute predictions
preds = automl.predict(X_test)

the second case:

# use the provideds models to compute predictions
preds = automl.predict(X_test, models=["model_1", "model_2"])

selected_models = []

# Model selection logic
match prediction_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by using match we need Python 3.10+

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I'll change the logic to use if's.

model_list_str = ", ".join(selected_model_names)

if n_selected == 1:
print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it for debug only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea of this part of the code is to serve as a explanation for the user regarding what models are being used for prediction, and how the output matrix is structured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove prints.

Copy link
Contributor

@pplonski pplonski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for update, let's create test cases to check how it is working.



def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray:
def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame], models = []) -> numpy.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that comment with description how the functions is working should be here. Docs are generated based on this comment https://supervised.mljar.com/api/#supervised.automl.AutoML.predict

if self._ml_task != REGRESSION
else predictions["prediction"].to_numpy()
)
def _predict(self, X, models=[]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add tests for predict. It would be good if we have following test cases:

  • no model selected, predict should use the best model,
  • models selected and predict should provide predictions for each model,
  • automl trained, and then loaded back from hard drive, and here two cases: (1) prediction computed on best model, (2) and predictions computed on selected models

If we are going to provide such functionality to predict then we should at it to predict_proba, predict_all as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. For the predict_all method, does it make sense for it to just call the new predict implementation? since both of them originally just called the base predict method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants