Here is the minimal code for the algorithms detailed in the ICLR 2025 paper Fast unsupervised ground metric learning with tree-Wasserstein distance. We embed data matrices on trees and leverage the tree-Wasserstein distance to efficiently learn ground metrics for both samples and features!
- We have modified parts of the treeOTrepository from Approximating 1-Wasserstein Distance with Trees, in particular the ClusterTree algorithm (to allow initialisation based on a custom input distance metric).
- Interested users are also encouraged to review the wsingularrepository from Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors. We include and attribute parts of this code intree-wsv.In particular, we compare our results to the standard Wasserstein Singular Vector and Sinkhorn Singular Vector algorithms implemented by Huizing et al. on a genomics PBMC dataset that was shared by the authors.
Once you have cloned the repository, set up a virtual environment using the listed requirements.
sudo pip install -r requirements.txt
Illustrative vignettes will be added to this repository as it is updated and commented.
@inproceedings{
dusterwald2025,
title     = {Fast unsupervised ground metric learning with tree-Wasserstein distance},
author    = {Kira M. D\"usterwald, Samo Hromadka and Makoto Yamada},
booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore},
publisher = {OpenReview.net},
year      = {2025},
url       = {https://openreview.net/forum?id=FBhKUXK7od}
}
The PBMC-3k preprocessed dataset was kindly shared by Huizing et al., 2002, and can be found on figshare. Other preprocessed datasets are available on request.
E-mail: kira (dot) dusterwald (dot) 21 (at) ucl.ac.uk
