-
-
Notifications
You must be signed in to change notification settings - Fork 282
Open
Description
I'm looking to use linfa
for k-means clustering, and the current k-means example is pretty incomprehensible to a newbie. It may be that this makes perfect sense to someone steeped in this API or even in ndarray
, but to me, the issues are:
- The current version of
rand
(0.9.0 as of Feb 2025) appears to be incompatible with the version used in the example - Generating random data from a PRNG doesn't help when my goal is to load data from somewhere else. How can I create a mutable data structure that I can push new vectors onto?
DatasetBase
indicates that it contains records and maybe targets, weights, and feature names. I have no clue what the target/weights are when I'm trying to create input.- Not having expected centroids, I'd like to lean on the API to either generate something random, something evenly distributed, or a use some quick heuristic otherwise.
Ultimately, my ideal is to do something like:
let mut records = Dataset::with_capacity(100_000); // expected number of input rows
for row in load_my_data("file.tsv") {
// where 'row' is, say, a [f64; 5] or a Vec<f32>?
records.push(row);
}
let initial_state = kmeans::generate_random_centroids(10 /* # clusters */, &records);
let clusters = kmeans::params_with(...).fit(&records);
for (id, cluster) in clusters.iter().enumerate() {
// presumably cluster is [f64; 5] or &[f32]
println!("Cluster {id} located @ {cluster:?}");
}
I realize this may diverge drastically from what currently exists, but I'd like to determine how to bridge this gap. Thanks!
Metadata
Metadata
Assignees
Labels
No labels