gagolews
diff --git a/‎.devel/sphinx/news.md
+2-1 b/‎.devel/sphinx/news.md
+2-1
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example1-1.pdf
-6 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example1-1.pdf
-6 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example1-1.png
66 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example1-1.png
66 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example2-3.pdf
20 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example2-3.pdf
20 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example2-3.png
-9 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example2-3.png
-9 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example3-5.pdf
7 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example3-5.pdf
7 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example3-5.png
999 Bytes b/‎.devel/sphinx/weave/clustbench-usage-figures/using-clustbench-example3-5.png
999 Bytes
diff --git a/‎.devel/sphinx/weave/clustbench-usage.Rmd
+1-2 b/‎.devel/sphinx/weave/clustbench-usage.Rmd
+1-2
diff --git a/‎.devel/sphinx/weave/clustbench-usage.md
+227 b/‎.devel/sphinx/weave/clustbench-usage.md
+227
diff --git a/‎.devel/sphinx/weave/external-validity-measures-figures/partition-similarity-example-4-1.pdf
0 Bytes b/‎.devel/sphinx/weave/external-validity-measures-figures/partition-similarity-example-4-1.pdf
0 Bytes
diff --git a/‎.devel/sphinx/weave/external-validity-measures-figures/partition-similarity-example-4-1.png
5.57 KB b/‎.devel/sphinx/weave/external-validity-measures-figures/partition-similarity-example-4-1.png
5.57 KB
diff --git a/‎.devel/sphinx/weave/external-validity-measures.Rmd
+2-2 b/‎.devel/sphinx/weave/external-validity-measures.Rmd
+2-2
@@ -1,9 +1,10 @@
 # Changelog
 
 
-## 1.1.x (2025-xx-xx)
+## 1.1.6 (2025-02-03)
 
 -   [BUGFIX] `numpy.DataSource` is now `numpy.lib.npyio.DataSource`.
+-   `load_dataset` now guarantees to return a C-contiguous `data` matrix.
 
 
 ## 1.1.5 (2024-08-22)
 
@@ -23,8 +23,7 @@ Below we discuss its basic features/usage.
 
 
 ::::{note}
-*To learn more about Python,
-check out Marek's open-access (free!) textbook*
+*To learn more about Python, check out Marek's open-access (free!) textbook*
 [Minimalist Data Wrangling in Python](https://datawranglingpy.gagolewski.com/)
 {cite}`datawranglingpy`.
 ::::
 
@@ -2,33 +2,260 @@
 
 
 
+(sec:clustbench-usage)=
+# Using *clustbench*
 
+The Python version of the *clustering-benchmarks* package
+can be installed from [PyPI](https://pypi.org/project/clustering-benchmarks/),
+e.g., via a call to:
 
+```
+pip3 install clustering-benchmarks
+```
 
+from the command line. Alternatively, please use your favourite Python
+package manager.
 
 
+Once installed, we can import it by calling:
 
 
+``` python
+import clustbench
+```
 
+Below we discuss its basic features/usage.
 
 
+::::{note}
+*To learn more about Python, check out Marek's open-access (free!) textbook*
+[Minimalist Data Wrangling in Python](https://datawranglingpy.gagolewski.com/)
+{cite}`datawranglingpy`.
+::::
 
 
+## Fetching Benchmark Data
 
+The datasets from the {ref}`sec:suite-v1` can be accessed easily.
+It is best to [download](https://github.com/gagolews/clustering-data-v1/releases/tag/v1.1.0)
+the whole repository onto our disk first.
+Let us assume they are available in the following directory:
 
 
+``` python
+# load from a local library (download the suite manually)
+import os.path
+data_path = os.path.join("~", "Projects", "clustering-data-v1")
+```
 
+Here is the list of the currently available benchmark batteries
+(dataset collections):
 
 
+``` python
+print(clustbench.get_battery_names(path=data_path))
+## ['fcps', 'g2mg', 'graves', 'h2mg', 'mnist', 'other', 'sipu', 'uci', 'wut']
+```
 
+We can list the datasets in an example battery by calling:
 
 
+``` python
+battery = "wut"
+print(clustbench.get_dataset_names(battery, path=data_path))
+## ['circles', 'cross', 'graph', 'isolation', 'labirynth', 'mk1', 'mk2', 'mk3', 'mk4', 'olympic', 'smile', 'stripes', 'trajectories', 'trapped_lovers', 'twosplashes', 'windows', 'x1', 'x2', 'x3', 'z1', 'z2', 'z3']
+```
 
+For instance, let us load the `wut/x2` dataset:
 
 
 
+``` python
+dataset = "x2"
+b = clustbench.load_dataset(battery, dataset, path=data_path)
+```
 
+The above call returned a named tuple. For instance, the corresponding README
+file can be inspected by accessing the `description` field:
 
 
+``` python
+print(b.description)
+## Author: Eliza Kaczorek (Warsaw University of Technology)
+## 
+## `labels0` come from the Author herself.
+## `labels1` were generated by Marek Gagolewski.
+## `0` denotes the noise class (if present).
+```
 
+Moreover, the `data` field gives the data matrix, `labels` is the list
+of all ground truth partitions (encoded as label vectors),
+and `n_clusters` gives the corresponding numbers of subsets.
+In case of any doubt, we can always consult the official documentation
+of the {any}`clustbench.load_dataset` function.
 
+::::{note}
+Particular datasets can be retrieved from an online repository directly
+(no need to download the whole battery first) by calling:
+
+
+``` python
+data_url = "https://github.com/gagolews/clustering-data-v1/raw/v1.1.0"
+b = clustbench.load_dataset("wut", "x2", url=data_url)
+```
+::::
+
+For instance, here is the shape (*n* and *d*) of the data matrix,
+the number of reference partitions, and their cardinalities *k*,
+respectively:
+
+
+``` python
+print(b.data.shape, len(b.labels), b.n_clusters)
+## (120, 2) 2 [3 4]
+```
+
+The following figure (generated via a call to
+[`genieclust`](https://genieclust.gagolewski.com/)`.plots.plot_scatter`)
+illustrates the benchmark dataset at hand.
+
+
+``` python
+import genieclust
+for i in range(len(b.labels)):
+    plt.subplot(1, len(b.labels), i+1)
+    genieclust.plots.plot_scatter(
+        b.data, labels=b.labels[i]-1, axis="equal", title=f"labels{i}"
+    )
+plt.show()
+```
+
+(fig:using-clustbench-example1)=
+```{figure} clustbench-usage-figures/using-clustbench-example1-1.*
+An example benchmark dataset and the corresponding ground truth labels.
+```
+
+
+## Fetching Precomputed Results
+
+Let us study one of the sets of
+[precomputed clustering results](https://github.com/gagolews/clustering-results-v1)
+stored in the following directory:
+
+
+
+``` python
+results_path = os.path.join("~", "Projects", "clustering-results-v1", "original")
+```
+
+They can be fetched by calling:
+
+
+``` python
+method_group = "Genie"  # or "*" for everything
+res = clustbench.load_results(
+    method_group, b.battery, b.dataset, b.n_clusters, path=results_path
+)
+print(list(res.keys()))
+## ['Genie_G0.1', 'Genie_G0.3', 'Genie_G0.5', 'Genie_G0.7', 'Genie_G1.0']
+```
+
+We thus have got access to precomputed data
+generated by the [*Genie*](https://genieclust.gagolewski.com)
+algorithm with different `gini_threshold` parameter settings.
+
+
+
+## Computing External Cluster Validity Measures
+
+
+Different
+{ref}`external cluster validity measures <sec:external-validity-measures>`
+can be computed by calling {any}`clustbench.get_score`:
+
+
+
+``` python
+pd.Series({  # for aesthetics
+    method: clustbench.get_score(b.labels, res[method])
+    for method in res.keys()
+})
+## Genie_G0.1    0.870000
+## Genie_G0.3    0.870000
+## Genie_G0.5    0.590909
+## Genie_G0.7    0.666667
+## Genie_G1.0    0.010000
+## dtype: float64
+```
+
+By default, normalised clustering accuracy is applied.
+As explained in the tutorial, we compare the predicted clusterings against
+{ref}`all <sec:many-partitions>` the reference partitions
+({ref}`ignoring noise points <sec:noise-points>`)
+and report the maximal score.
+
+Let us depict the results for the `"Genie_G0.3"` method:
+
+
+``` python
+method = "Genie_G0.3"
+for i, k in enumerate(res[method].keys()):
+    plt.subplot(1, len(res[method]), i+1)
+    genieclust.plots.plot_scatter(
+        b.data, labels=res[method][k]-1, axis="equal", title=f"{method}; k={k}"
+    )
+plt.show()
+```
+
+(fig:using-clustbench-example2)=
+```{figure} clustbench-usage-figures/using-clustbench-example2-3.*
+Results generated by Genie.
+```
+
+
+## Applying Clustering Methods Manually
+
+Naturally, the aim of this benchmark framework is also to test new methods.
+We can use {any}`clustbench.fit_predict_many` to generate
+all the partitions required to compare ourselves against the reference labels.
+
+For instance, let us investigate the behaviour of the k-means algorithm:
+
+
+``` python
+import sklearn.cluster
+m = sklearn.cluster.KMeans(n_init=10)
+res["KMeans"] = clustbench.fit_predict_many(m, b.data, b.n_clusters)
+clustbench.get_score(b.labels, res["KMeans"])
+## np.float64(0.9848484848484849)
+```
+
+We see that k-means (which specialises in detecting symmetric Gaussian-like blobs)
+performs better than *Genie* on this particular dataset.
+
+
+``` python
+method = "KMeans"
+for i, k in enumerate(res[method].keys()):
+    plt.subplot(1, len(res[method]), i+1)
+    genieclust.plots.plot_scatter(
+        b.data, labels=res[method][k]-1, axis="equal", title=f"{method}; k={k}"
+    )
+plt.show()
+```
+
+(fig:using-clustbench-example3)=
+```{figure} clustbench-usage-figures/using-clustbench-example3-5.*
+Results generated by K-Means.
+```
+
+For more functions, please refer to the package's documentation (in the next section).
+Moreover, {ref}`sec:colouriser` describes a standalone application
+that can be used to prepare our own two-dimensional datasets.
+
+Note that you do not have to use the *clustering-benchmark* package
+to access the benchmark datasets from our repository.
+The {ref}`sec:how-to-access` section mentions that most operations
+involve simple operations on files and directories which you can
+implement manually. The package was developed merely for the users'
+convenience.
@@ -419,7 +419,7 @@ C[:, o-1]
 
 
 
-
+<!--
 ### Pair Sets Index
 
 If the symmetry property is required, the pair sets index
@@ -488,7 +488,7 @@ for negative values are difficult to interpret.
 
 
 *Implementation: [`genieclust`](https://genieclust.gagolewski.com/)`.compare_partitions.pair_sets_index`*.
-
+-->
 
 ## Counting Concordant and Discordant Point Pairs