Skip to content

Bug: min_cluster_samples should not be set to a non-integer #223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelkerber opened this issue Apr 9, 2021 · 4 comments · Fixed by #224
Closed

Bug: min_cluster_samples should not be set to a non-integer #223

michaelkerber opened this issue Apr 9, 2021 · 4 comments · Fixed by #224

Comments

@michaelkerber
Copy link

Is your feature request related to a problem? Please describe.
KeplerMapper does not work in combination with sklearn.clustering.AgglomerativeClustering (using distance_threshold). It seems that KeplerMapper relies on the parameter "min_clusters/n_clusters/min_samples" to be available (kmapper.py, line 506).

Describe the solution you'd like
The documentation should be more clear about what is expected from the clustering method to work in KeplerMapper.

@deargle
Copy link
Collaborator

deargle commented Apr 9, 2021

That code block has a final fallback of 2 if none of the previous are available:

min_cluster_samples = cluster_params.get(
"n_clusters",
cluster_params.get(
"min_cluster_size", cluster_params.get("min_samples", 2)
),
)

@deargle
Copy link
Collaborator

deargle commented Apr 9, 2021

Also, here's an example of someone using AgglomerativeClustering #185 (comment) , what error message are you getting? Full stack trace please?

@michaelkerber
Copy link
Author

I tried this clustering method:
cluster_method=sklearn.cluster.AgglomerativeClustering(n_clusters=None,distance_threshold=0.2,compute_full_tree=True,linkage='single')

Traceback (most recent call last):
File "mapper_cat.py", line 32, in
cover=my_cover,
File "/home/mkerber/.local/lib/python3.7/site-packages/kmapper/kmapper.py", line 529, in map
if hypercube.shape[0] >= min_cluster_samples:
TypeError: '>=' not supported between instances of 'int' and 'NoneType'

I think the error makes sense, since n_clusters is set, just not to an integer. The documentation of AgglomerativeClustering says that n_clusters has to be set to None if distance_threshold is set.

Perhaps you can check whether the parameter is an integer and if not, set it to 2 instead?

@deargle
Copy link
Collaborator

deargle commented Apr 9, 2021

Indeed! Thanks for the bug report.I remember now that @torlarse had hard-coded kmapper to use min_cluster_samples = 2 when they posted the code in that comment. Maybe you can try that until a patch is released?

@deargle deargle changed the title Clustering method Bug: min_cluster_samples should not be set to a non-integer Apr 9, 2021
deargle added a commit that referenced this issue Apr 10, 2021
deargle added a commit that referenced this issue May 10, 2021
agglomerative clustering was failing because the old way of checking the clusterer for a setting for `min_cluster_samples` didn't ensure that the value, if set, was an int (it is `None` for agglomerative clustering)

closes #223
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants