Cannot See Effective Memory Reduction with ZeRO 3 #1645
hpourmodheji
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I tried to train large bing BERT (w/ ~300M parameters) on 8 GPUs (12GB GeForce GTX 1080 Ti) to see memory reduction with ZeRO 3. It shows linear memory reduction (8x) on model state, however, the max of the memory consumption does not have the same reduction during the training. What am I maybe missing here that I do not see the effectiveness of ZeRO 3? Any help is very much appreciated.
Baseline - GPU Memory Consumption during Training with single GPU
ZeRO 3 - GPU Memory Consumption during Training on 8 GPUs with ZeRO 3 enabled.
Beta Was this translation helpful? Give feedback.
All reactions