KMeans Optimization: Incorporating Weights into Re-Clustering Process

Hello,

As discussed in [this topic on Dask's forum](https://dask.discourse.group/t/dask-ml-kmeans-optimization/3501), my colleague and I compared in a distributed environment the `dask-ml` implementation of the KMeans class with our own implementation. During the comparison, we observed that the `dask-ml` initialization doesn't appear to use weights during the centroid re-clustering phase.

In the current `dask-ml` KMeans implementation, the standard KMeans algorithm is used for centroid re-clustering. In contrast, we incorporated weights into two areas:
- KMeans++ initialization.
- Weighted average during centroid re-clustering.

Although our implementation is less efficient than `dask-ml` in terms of execution time, we achieved better results when clustering a blob dataset, likely due to a reduction in the number of clustering iterations rather than direct code optimizations.

If you're interested, feel free to review our repository for further details on our approach:  
[GitHub Repository](https://github.com/ChiaTrama/Management_and_Analysis_of_Physics_Dataset_B).

Thank you for considering this issue.

Best regards,  
Chiara

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

KMeans Optimization: Incorporating Weights into Re-Clustering Process #1001

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions