Add of Online Hierarchical Clustering #1218

kchardon · 2023-04-11T09:41:16Z

No description provided.

MaxHalford · 2023-04-30T16:31:29Z

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

kchardon · 2023-05-22T15:11:54Z

Hey there! I hope it's ok for me to answer only by now.

Am I correct in assuming the algorithm stores all the data points it sees in memory (i.e. the X attribute)?

Hello, sorry for replying that late.
I use the window_size attribute and when there are more data than allowed, it deletes the oldest data point.
If window_size < 1, then it stores all the data points

MaxHalford · 2023-05-22T15:55:13Z

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

kchardon · 2023-05-22T19:53:58Z

If window_size < 1, then it stores all the data points

Ok I see, fair. But I don't think we'll ever want that behavior. Could you remove it?

Yes I can. So I add an error if the value of window_size is not an integer > 0 ?

MaxHalford · 2023-05-23T07:11:37Z

Nope, no need to check for an error. An exception will raise itself at some point. In general, we don't do input validation. Instead, we document well.

kchardon · 2023-05-23T19:59:04Z

Okay ! I will delete it

hoanganhngo610 · 2023-09-11T10:09:37Z

Hi @kchardon! I am Hoang-Anh, the maintainer of the clustering module of River. For now, I think that it would be best if I can take over the review of this PR and proceed with it to get it merged to River as soon as possible, since from my first glance, the PR has been of really high quality.
If you wouldn't mind, first, I will start by refactoring the code and make changes to make it align with the rest from River. After all necessary changes are in place, I will get @MaxHalford to have a final look and give the green light.

…ing Tree." at the end of the tree.

… not used within the algorithm).

review-notebook-app · 2024-11-10T18:45:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

kchardon · 2024-11-11T21:03:30Z

Hello, sorry to come back to this PR so late.
I've changed the code to not rely on numpy and fixed the issues during the checks.
Let me know if anything else can be improved!

kchardon requested review from MaxHalford and smastelini as code owners April 11, 2023 09:41

kchardon marked this pull request as draft April 11, 2023 11:55

kchardon marked this pull request as ready for review April 11, 2023 12:26

hoanganhngo610 requested review from hoanganhngo610 and Dennis1989 as code owners September 11, 2023 10:13

hoanganhngo610 and others added 17 commits September 11, 2023 23:47

Simplify comments.

bef5f29

Modify __str__ printed output by adding "Printed Hierarchical Cluster…

fd3f9b9

…ing Tree." at the end of the tree.

Rename predict_otd.

05f9cf4

Split comments and rename printTree to print_tree.

d8da76b

Modify self.X to self.x_clusters.

98eee88

Lexical changes.

d3d9398

Remove unnecessary comments.

f5cebe7

Support pandas v2 (online-ml#1321)

0539046

Refactor Docstring

9ef224d

Refactor comment.

aa597da

Make find_path() a static method.

66927cd

Refactor Docstring.

87cf7fd

Make print_tree static method.

d12f243

Refactor code to account for failing tests.

43884f0

Refactor distance function used in Hierarchical Clustering class.

2a80d3c

Delete euclidean_distance function (due to being unnecessary).

74055db

Code refactoring to align with other algorithms available in River.

c356828

hoanganhngo610 and others added 18 commits November 10, 2024 19:44

Rename predict_otd.

b8ddad0

Split comments and rename printTree to print_tree.

8fd902b

Modify self.X to self.x_clusters.

9bb4427

Lexical changes.

121d94c

Remove unnecessary comments.

93419f8

Refactor Docstring

29011c9

Refactor comment.

793ef29

Make find_path() a static method.

da438f7

Refactor Docstring.

ea57417

Make print_tree static method.

aaa04c6

Refactor code to account for failing tests.

da92685

Refactor distance function used in Hierarchical Clustering class.

ec4fc6d

Delete euclidean_distance function (due to being unnecessary).

fae5ebe

Code refactoring to align with other algorithms available in River.

7a736b3

Modify Docstring description for dist_func.

3f2b671

Delete least common ancestor finding function (since this function is…

4c8079d

… not used within the algorithm).

removing use of numpy

34aa9ad

Merge branch 'main' of https://github.com/kchardon/river_project

fcbcdb5

kchardon requested review from AdilZouitine and gbolmier as code owners November 10, 2024 18:45

kchardon added 8 commits November 10, 2024 19:58

ruff format

0c745a2

odac issue

ce5f9a4

remove hcluster

9662a7f

last main commit

a4ec3c9

hcluster

10cd5e5

ruff

e942ba2

new test

c1deecf

correct test

0923005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add of Online Hierarchical Clustering #1218

Add of Online Hierarchical Clustering #1218

kchardon commented Apr 11, 2023

MaxHalford commented Apr 30, 2023

kchardon commented May 22, 2023 •

edited

Loading

MaxHalford commented May 22, 2023

kchardon commented May 22, 2023

MaxHalford commented May 23, 2023

kchardon commented May 23, 2023

hoanganhngo610 commented Sep 11, 2023

review-notebook-app bot commented Nov 10, 2024

kchardon commented Nov 11, 2024

Add of Online Hierarchical Clustering #1218

Are you sure you want to change the base?

Add of Online Hierarchical Clustering #1218

Conversation

kchardon commented Apr 11, 2023

MaxHalford commented Apr 30, 2023

kchardon commented May 22, 2023 • edited Loading

MaxHalford commented May 22, 2023

kchardon commented May 22, 2023

MaxHalford commented May 23, 2023

kchardon commented May 23, 2023

hoanganhngo610 commented Sep 11, 2023

review-notebook-app bot commented Nov 10, 2024

kchardon commented Nov 11, 2024

kchardon commented May 22, 2023 •

edited

Loading