DataGossip [paper] is an extension for asynchronous distributed data parallel machine learning that improves the training on imbalanced partitions.
requires conda:
$ conda env create -f environment.yml
$ conda activate datagossip
$ python setup.py install
Download and transform the datasets on your main machine:
$ python prepare_datasets.py
Then, run the following script on each cluster node to start the training. Be aware to set the right ranks and sizes!
$ python experiments/train.py --rank=<rank> --size=<size> --main_address=<main_address>
Afterwards, you can find the results of the experiment in the files (on your machine with rank=0) experiments.pkl and evaluations.pkl which hold pandas DataFrames.
Please consider citing:
@inproceedings{wenig2022datagossip,
title={DataGossip: A Data Exchange Extension for Distributed Machine Learning Algorithms},
author={Wenig, Phillip and Papenbrock, Thorsten},
booktitle={Proceedings of the International Conference on Extending Database Technology (EDBT)},
year={2022},
pages={373--377},
doi={10.48786/edbt.2022.24},
url={http://dx.doi.org/10.48786/edbt.2022.24},
}