Skip to content

Update for sklearn 1.6 #1371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 3, 2025
Merged

Update for sklearn 1.6 #1371

merged 5 commits into from
Jul 3, 2025

Conversation

perib
Copy link
Contributor

@perib perib commented Apr 13, 2025

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

Sklearn API changed. See some relevant documentation below:

PR with the change here: scikit-learn/scikit-learn#29677
Also see the list of available tags here: https://scikit-learn.org/stable/modules/generated/sklearn.utils.Tags.html (click on each of the links in the parameters for each set of tags)
How to develop sklearn estimators here: https://scikit-learn.org/stable/developers/develop.html
I modeled the GraphPipeline and TPOTEstimator tags based off of the sklearn Pipeline class: https://github.com/scikit-learn/scikit-learn/blob/98ed9dc73/sklearn/pipeline.py#L1218

See xgboost’s recent update to address these changes here: dmlc/xgboost#11021

Changes:

  1. When inheriting, mixins should go on left, baseestimator on right (my understanding is that this is required/expected by the sklearn API now)
  2. Updated the tags of the nn modules, mask selector, FeatureSetSelector, and tpotestimator to math the new format
  3. Now only include skrebate and sklearnx in GROUPNAME if the packages are installed. This allows tests to complete when the packages are not able to be installed. (On apple-silicon I wasn’t able to install sklearnx so my tests always failed. now the tests pass with a warning if these packages were not tested).
  4. Adaboost no longer supports the algorithm parameter so that was removed
  5. increase minimum requires sklearn version to 1.6 and xgboost to 3.0

Tags are kinda awkward in our case since it depends n the final search space and generated pipelines, which may not be known until we actually generate a pipeline. I tried to address this by having a default set of tags to start, but then pulling from the tags of the final fitted pipeline. This seems to work.

Passes all tests on my apple silicon MacBook.

Where should the reviewer start? / How should this PR be tested?

Run a few tpot pipelines with high verbosity to check for any errors relating to tags (such as sklearn_tags() being missing). In particular double check both TPOTEstimator and GraphPipeline since their tags depend on their internal/learned pipelines..

What are the relevant issues?

#1369 #1370

Questions:

  • Do the docs need to be updated?
  • Does this PR add new (Python) dependencies?
    dependencies must be updated to support sklearn 1.6. xgboost must be at least 3.0 to be compatible with sklearn 1.6

@theaksaini
Copy link
Contributor

Can you explain the utility of sklearn tags? Do we need to use tags for the current or future versions of sklearn? Other changes you made are fine, and we can accept those changes once we clarify the use of tags.

@perib
Copy link
Contributor Author

perib commented May 2, 2025

Tags are briefly described here: https://scikit-learn.org/stable/developers/develop.html#estimator-tags .

Tags are used by various sklearn functions to determine what the estimator (classifier, transformer, etc) is capable of. For example, is_classifier and is_regressor now use tags to determine if an estimator is a classifier and regressor. This is used in various functions like certain scorers and cross_val_predict. Generally as a check to make sure that the inputs to the function are compatible, and if not, throwing an error/warning (some scorers only work for classifiers, for example).

Previously tags were more of an internal to sklearn feature. For custom estimators it attempted to infer the tags from the functions present (such as predict_proba for classifiers, or what mixin was inherited.).

However, TPOTEstimaor doesn't inherit from ClassifierMixin or RegressorMixin, so it previously used the estimator_type attribute. Now it prioritizes using tags. Without these tags, some of these functions would throw errors. For example, TPOTEstimator may be assumed to always be a classifier as it has a predict_proba function. When used as a regressor, sklearn will think its a classifier still and some functions should throw an error saying that they are only supported for regressors. Or some classifiers don't support predict_proba and would be assumed to be a regressor (I forget which direction I had the most issues with, but the gist is the same, ambiguity as to what the estimator is causes some unexpected behavior).

Now they cleaned up the API to make it officially part of the sklearn conventions for custom estimators. This is a helpful chance as TPOT doesn't inherit from ClassifierMixin or RegressorMixin but we can now still correctly label the instance as classifier or regressor. So all BaseEstimators included in TPOT (including things like MaskSelector) should implement a tags function to ensure that they are all accurately labeled and will work as intended with the rest of the sklearn ecosystem.

Without these changes, I was getting several errors with TPOT on sklearn 1.6. In some cases TPOT might run, but the custom estimators would crash when being evaluated (as you can test with verbose=5). With these changes, I believe all compatibility issues are resolved.

Do we need to use tags for the current or future versions of sklearn?

This version of tags are required for TPOT to work correctly on scikit learn version 1.6 and greater. It will not work on older versions. There is a way of making it backwards compatible with older versions as well but that would be more complicated and I wasn't sure if it would be worth it.

@jay-m-dev jay-m-dev merged commit 9f00658 into EpistasisLab:main Jul 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants