GraVAC: Adaptive Compression for Communication-Efficient Distributed ML

Implementation of adaptive compression and related work of GraVAC presented at IEEE International Conference on Cloud Computing (CLOUD), 2023, Chicago, Illinois, USA.

Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs considerable overhead, exacerbated by the increasing size and complexity of state-of-the-art neural networks. Although many gradient compression techniques propose to reduce communica- tion cost, the ideal compression factor that leads to maximum speedup or minimum data exchange remains an open-ended problem since it varies with the quality of compression, model size and structure, hardware, network topology and bandwidth. We propose GraVAC, a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing gradient information loss associated with compression. GraVAC works in an online, black-box manner without any prior assumptions about a model or its hyperparameters, while achieving the same or better accuracy than dense SGD (i.e., no compression) in the same number of iterations/epochs. As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32×, 1.95× and 6.67× respectively. Compared to other adaptive schemes, our framework provides 1.94× to 5.63× overall speedup.

ACCESS LINKS

Link1
Link2

RUNNING

Still needs implementation of multi-level compression in GraVAC.
Go to scripts directory to execute scripts: contains baseline uncompressed training or static compression training in run_baseline.sh. Accordion executed by run_accordion.sh. And GraVAC can be launched by run_gravac.sh.
Contains implementations of TopK, DGC, RedSync and RandomK compressions.
Models trained: ResNet101, VGG16 and LSTM on CIFAR10, CIFAR100 AND PTB dataset.

CITATION

Bibtex: @article{Tyagi2023GraVACAC, title={GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training}, author={Sahil Tyagi and Martin Swany}, journal={2023 IEEE 16th International Conference on Cloud Computing (CLOUD)}, year={2023}, pages={319-329}}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
gravac_py3		gravac_py3
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraVAC: Adaptive Compression for Communication-Efficient Distributed ML

About

Uh oh!

Releases

Packages

Languages

License

sahiltyagi4/GraVAC

Folders and files

Latest commit

History

Repository files navigation

GraVAC: Adaptive Compression for Communication-Efficient Distributed ML

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages