Does GMM have the option to get the cluster probability? #372

rickbeeloo · 2025-01-31T17:19:27Z

Hey!

After doing the fit:

// We fit the model from the dataset setting some options
    let gmm = GaussianMixtureModel::params(n_clusters)
                .n_runs(10)
                .tolerance(1e-4)
                .with_rng(rng)
                .fit(&dataset).expect("GMM fitting");

    // Then we can get dataset membership information, targets contain **cluster indexes**
    // corresponding to the cluster infos in the list of GMM means and covariances
    let blobs_dataset = gmm.predict(dataset);

Can I get the probability of it belonging to one of the clusters? It will assign it to the cluster with the highest probability (I assume) however, I want to only assign if the probability is higher than a specific threshold.

I'm not familiar with linfa at all, so perhaps there is a standard way of doing this.

Thanks for building all this!

The text was updated successfully, but these errors were encountered:

relf · 2025-02-01T18:55:47Z

Hi. Unfortunatly, at the moment, this information is not made available.

The good news is that the implementation is rather straightforward. If I understand correctly you need predict_proba() as implemented in scikit-learn. As linfa GMM implementation is a direct port of the scikit-learn one, the code is pretty similar and the method can be easily ported. Let me know if you want to open a PR on this.

relf added the good first issue Good for newcomers label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does GMM have the option to get the cluster probability? #372

Does GMM have the option to get the cluster probability? #372

rickbeeloo commented Jan 31, 2025

relf commented Feb 1, 2025

Does GMM have the option to get the cluster probability? #372

Does GMM have the option to get the cluster probability? #372

Comments

rickbeeloo commented Jan 31, 2025

relf commented Feb 1, 2025