-
Notifications
You must be signed in to change notification settings - Fork 430
Predict from selected or from all models models (Issues #798 and #423) #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Predict from selected or from all models models (Issues #798 and #423) #805
Conversation
pplonski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for PR,
supervised/base_automl.py
Outdated
| X : array-like, pandas.DataFrame | ||
| Input data to generate predictions for. | ||
| prediction_mode : str, default='best' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for PR! You assumed that user already has a list with loaded models. Do you have example code for such use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use for this case would be something like: predictions = automl.predict(X_test, prediction_mode = 'custom', custom_models=["2_DecisionTree", "3_Linear"]). I was not sure how to change the names for the models, since they are objects from the ModelFramework class, and I didn't want to change it. To minimize that I thought about adding the available models names to the documentation and printing an error that shows their names aswell, in case the user enters invalid names.
supervised/base_automl.py
Outdated
| prediction_mode : str, default='best' | ||
| Model selection strategy: | ||
| - 'best': selects the top `n_models` models ranked by performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is too much options, can options be set based on list of models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I think I can leave the default mode as best, so that it doesnt affect previously working code, and leave the option to select all modes as a prediction mode, but change the custom mode to be used based on the list of models the user provides. What do you think?
supervised/base_automl.py
Outdated
| - 'custom': selects only the models explicitly listed in `custom_models`. | ||
| - 'all': uses all trained models. | ||
| n_models : int, default=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need n_models? can it be set from list of models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for n_models was that if a user is using the 'best' mode, but, for example, wants the predictions for the top 3 performing models, n_models could be set to 3, so that the user gets the 3 best predictions. Should I keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep only list of models ad additional argument and use all models provided in the list.
The first case would be:
# use the best model to compute predictions
preds = automl.predict(X_test)the second case:
# use the provideds models to compute predictions
preds = automl.predict(X_test, models=["model_1", "model_2"])
supervised/base_automl.py
Outdated
| selected_models = [] | ||
|
|
||
| # Model selection logic | ||
| match prediction_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by using match we need Python 3.10+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I'll change the logic to use if's.
supervised/base_automl.py
Outdated
| model_list_str = ", ".join(selected_model_names) | ||
|
|
||
| if n_selected == 1: | ||
| print( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it for debug only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea of this part of the code is to serve as a explanation for the user regarding what models are being used for prediction, and how the output matrix is structured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to remove prints.
pplonski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update, let's create test cases to check how it is working.
|
|
||
|
|
||
| def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray: | ||
| def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame], models = []) -> numpy.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comment with description how the functions is working should be here. Docs are generated based on this comment https://supervised.mljar.com/api/#supervised.automl.AutoML.predict
| if self._ml_task != REGRESSION | ||
| else predictions["prediction"].to_numpy() | ||
| ) | ||
| def _predict(self, X, models=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add tests for predict. It would be good if we have following test cases:
- no model selected, predict should use the best model,
- models selected and predict should provide predictions for each model,
- automl trained, and then loaded back from hard drive, and here two cases: (1) prediction computed on best model, (2) and predictions computed on selected models
If we are going to provide such functionality to predict then we should at it to predict_proba, predict_all as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. For the predict_all method, does it make sense for it to just call the new predict implementation? since both of them originally just called the base predict method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense.
This PR is an attempt at solving issues #798 and #423. We added new parameters in the _predict function of the base_automl class and adapted the parts of the code that we found necessary. We did not implement direct unit tests since we did not find unit tests that directly test the predict method. If possible, it would be nice to get a direction in that sense, so that this implementation can be merged. Thanks