Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mini-batch TF-IDF raise an exception #1631

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

e10e3
Copy link
Contributor

@e10e3 e10e3 commented Nov 14, 2024

feature _extraction.TFIDF has no explicit implementation of the mini-batch methods learn_many and transform_many, meaning that Python will fall back on the ones provided by its parent feature _extraction.BagOfWords. This causes TF-IDF to have a different behaviour in single-instance and in mini-batch mode.

This PR is a band-aid that adds the methods for TF-IDF to make it explicit they are not supported. Both will raise an exception when called.

A true mini-batch version could be added at a later point.

Fixes #1629

There is currently no mini-batch implementation of TF-IDF.
To prevent Python from using the methods  from the parent class
BagOfWords (which would give incorrect results), we add the methods to
TF-IDF and raise an error.
The paramters were documented in the docstring but were not in the
constructor.
@MaxHalford MaxHalford merged commit de119ab into online-ml:main Nov 14, 2024
4 checks passed
@e10e3 e10e3 deleted the tfidf-no-many branch November 14, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TF-IDF uses the wrong transform_many()
2 participants