Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add of Online Hierarchical Clustering #1218

Open
wants to merge 361 commits into
base: main
Choose a base branch
from
Open

Conversation

kchardon
Copy link

No description provided.

@kchardon kchardon marked this pull request as draft April 11, 2023 11:55
@kchardon kchardon marked this pull request as ready for review April 11, 2023 12:26
@MaxHalford
Copy link
Member

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

@kchardon
Copy link
Author

kchardon commented May 22, 2023

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

Hello, sorry for replying that late.
I use the window_size attribute and when there are more data than allowed, it deletes the oldest data point.
If window_size < 1, then it stores all the data points

@MaxHalford
Copy link
Member

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

@kchardon
Copy link
Author

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

Yes I can. So I add an error if the value of window_size is not an integer > 0 ?

@MaxHalford
Copy link
Member

Nope, no need to check for an error. An exception will raise itself at some point. In general, we don't do input validation. Instead, we document well.

@kchardon
Copy link
Author

Okay ! I will delete it

@hoanganhngo610
Copy link
Contributor

Hi @kchardon! I am Hoang-Anh, the maintainer of the clustering module of River. For now, I think that it would be best if I can take over the review of this PR and proceed with it to get it merged to River as soon as possible, since from my first glance, the PR has been of really high quality.
If you wouldn't mind, first, I will start by refactoring the code and make changes to make it align with the rest from River. After all necessary changes are in place, I will get @MaxHalford to have a final look and give the green light.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@kchardon
Copy link
Author

Hello, sorry to come back to this PR so late.
I've changed the code to not rely on numpy and fixed the issues during the checks.
Let me know if anything else can be improved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.