[ENH] K-means: clusters can be inferred for new data #7010

markotoplak · 2025-01-30T15:34:59Z

Issue

@borondics had a so big data set that k-means was too slow, so he tried doing it on a sample and then modelling the clustering with a classifier. But for k-means this should not be needed because we could use the means/medoids to "predict" the cluster directly.

With one additional line we could make the Cluster useful with Apply domain.

Why don't we already do it? All the machinery is already there, in ClusteringModel, which does seems unused though. Does

Includes

Code changes
Tests
Documentation

codecov · 2025-01-30T15:54:01Z

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (master@66928eb). Learn more about missing BASE report.
Report is 23 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #7010   +/-   ##
=========================================
  Coverage          ?   88.72%           
=========================================
  Files             ?      332           
  Lines             ?    73444           
  Branches          ?        0           
=========================================
  Hits              ?    65164           
  Misses            ?     8280           
  Partials          ?        0

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

markotoplak · 2025-05-30T12:52:36Z

@janezd, I finally finished __eq__ and __hash__. To make it work, I needed to change some attributes into read-only properties (which also made me rewrite a test that relied on changing them).

markotoplak added the needs discussion Core developers need to discuss the issue label Jan 30, 2025

markotoplak marked this pull request as draft January 30, 2025 15:36

janezd self-assigned this Feb 7, 2025

markotoplak assigned markotoplak and unassigned janezd Feb 12, 2025

markotoplak removed the needs discussion Core developers need to discuss the issue label Mar 18, 2025

markotoplak force-pushed the same_clustering_for_new_data branch from 1dc93bf to 87bf4cc Compare March 28, 2025 11:02

janezd added this to the 3.39 milestone May 30, 2025

markotoplak added 3 commits May 30, 2025 13:28

K-Means: output feature clusters has compute_value

598ecd4

test_owkmeans: test cluster as compute value

ab0b5de

owkmeans: fix original_domain of output KMeansModel

a5acdef

markotoplak force-pushed the same_clustering_for_new_data branch 2 times, most recently from f213cbe to 0fb6067 Compare May 30, 2025 11:53

KMeansModel: __eq__ and __hash__

79d9a62

markotoplak force-pushed the same_clustering_for_new_data branch from 0fb6067 to 79d9a62 Compare May 30, 2025 12:13

test_centroids_on_output: do not modify cluster model

7be66ba

markotoplak marked this pull request as ready for review May 30, 2025 12:51

markotoplak removed their assignment May 30, 2025

markotoplak requested a review from janezd May 30, 2025 12:52

Trubar

6ff8cff

markotoplak force-pushed the same_clustering_for_new_data branch from e52507e to 6ff8cff Compare June 3, 2025 07:21

VesnaT approved these changes Jun 4, 2025

View reviewed changes

markotoplak removed the request for review from janezd June 11, 2025 12:20

markotoplak merged commit 49fdb23 into biolab:master Jun 11, 2025
21 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] K-means: clusters can be inferred for new data #7010

[ENH] K-means: clusters can be inferred for new data #7010

Uh oh!

markotoplak commented Jan 30, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 30, 2025 •

edited

Loading

Uh oh!

markotoplak commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[ENH] K-means: clusters can be inferred for new data #7010

[ENH] K-means: clusters can be inferred for new data #7010

Uh oh!

Conversation

markotoplak commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Includes

Uh oh!

codecov bot commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

markotoplak commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

markotoplak commented Jan 30, 2025 •

edited

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading