DeepSpeed v0.3.10
v0.3.10 Release notes
Combined release notes since November 12th v0.3.1 release
- Various updates to torch.distributed initialization
- Transformer kernel updates
- Elastic training support (#602)
- NOTE: More details to come on this feature, currently still in initial piloting of this feature.
- Module replacement support #586
- NOTE: Will be used more and documented in the short-term to help automatically inject/replace deepspeed ops into client models.
- #528 removes dependencies psutil and cpufeature
- Various ZeRO 1 and 2 bug fixes and updates: #531, #532, #545, #548
- #543 backwards compatible checkpoints with older deepspeed v0.2 version
- Add static_loss_scale support to unfused optimizer #546
- Bug fix for norm calculation in absence of model parallel group #551
- Switch CI from azure pipelines to github actions
- Deprecate client ability to disable gradient reduction #552
- Bug fix for tracking optimizer step in cpu-adam when loading checkpoint #564
- Improved support for Ampere architecture #572, #570, #577, #578, #591, #642
- Fix potential random layout inconsistency issues in sparse attention modules #534
- Supported customizing kwargs for lr_scheduler #584
- Support deepspeed.initialize with dict configuration instead of arg #632
- Allow DeepSpeed models to be initialized with optimizer=None #469
Special thanks to our contributors in this release
@stas00, @gcooper-isi, @g-karthik, @sxjscience, @brettkoonce, @carefree0910, @Justin1904, @harrydrippin