Skip to content

DeepSpeed v0.3.10

Compare
Choose a tag to compare
@jeffra jeffra released this 12 Jan 18:17
· 2235 commits to master since this release

v0.3.10 Release notes

Combined release notes since November 12th v0.3.1 release

  • Various updates to torch.distributed initialization
    • New deepspeed.init_distributed API, #608, #645, #644
    • Improved AzureML support for patching torch.distributed backend, #542
    • Simplify dist init and only init if needed #553
  • Transformer kernel updates
    • Support for different hidden dimensions #559
    • Support arbitrary sequence-length #587
  • Elastic training support (#602)
    • NOTE: More details to come on this feature, currently still in initial piloting of this feature.
  • Module replacement support #586
    • NOTE: Will be used more and documented in the short-term to help automatically inject/replace deepspeed ops into client models.
  • #528 removes dependencies psutil and cpufeature
  • Various ZeRO 1 and 2 bug fixes and updates: #531, #532, #545, #548
  • #543 backwards compatible checkpoints with older deepspeed v0.2 version
  • Add static_loss_scale support to unfused optimizer #546
  • Bug fix for norm calculation in absence of model parallel group #551
  • Switch CI from azure pipelines to github actions
  • Deprecate client ability to disable gradient reduction #552
  • Bug fix for tracking optimizer step in cpu-adam when loading checkpoint #564
  • Improved support for Ampere architecture #572, #570, #577, #578, #591, #642
  • Fix potential random layout inconsistency issues in sparse attention modules #534
  • Supported customizing kwargs for lr_scheduler #584
  • Support deepspeed.initialize with dict configuration instead of arg #632
  • Allow DeepSpeed models to be initialized with optimizer=None #469

Special thanks to our contributors in this release

@stas00, @gcooper-isi, @g-karthik, @sxjscience, @brettkoonce, @carefree0910, @Justin1904, @harrydrippin