Question about DeepSpeed bucket #1184
Unanswered
huangyanjuner
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I am reading the zero2.py code. I learned from DDP's paper that the bucketing strategy pays much attention to the gradient reducing order. As you known, the reducing order must be the same across all processes, otherwise, AllReduce contents might mismatch, resulting in incorrect reduction result or coredump. However, Autograd engine is a parallel graph execution engine, which would cause different gradients ready order among processes. DDP leverages Parameter-to-Bucket Mapping, index inside Bucket and index between Buckets together to maker sure the strict order when "AllReduce". As DeepSpeed, I am wandering how it works to handle the bucket? Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions