Question about DeepSpeed bucket #1184

huangyanjuner · 2021-06-24T12:54:22Z

huangyanjuner
Jun 24, 2021

Hello, I am reading the zero2.py code. I learned from DDP's paper that the bucketing strategy pays much attention to the gradient reducing order. As you known, the reducing order must be the same across all processes, otherwise, AllReduce contents might mismatch, resulting in incorrect reduction result or coredump. However, Autograd engine is a parallel graph execution engine, which would cause different gradients ready order among processes. DDP leverages Parameter-to-Bucket Mapping, index inside Bucket and index between Buckets together to maker sure the strict order when "AllReduce". As DeepSpeed, I am wandering how it works to handle the bucket? Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about DeepSpeed bucket #1184

{{title}}

Replies: 0 comments

Select a reply

Question about DeepSpeed bucket #1184

huangyanjuner Jun 24, 2021

Replies: 0 comments

huangyanjuner
Jun 24, 2021