Why doesn't the bing squad bert example or cifar example work with the zero optimizer? #992
SantoshGuptaML
started this conversation in
General
Replies: 2 comments 1 reply
-
It was mostly due to this issue, after I updated it, I've had no problem |
Beta Was this translation helpful? Give feedback.
0 replies
-
@SantoshGuptaML, so can this issue be closed? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I can get the bing squad bert example to run as is, but when I want to experiment with the zero optimizer and add
to https://github.com/microsoft/DeepSpeedExamples/blob/master/BingBertSquad/deepspeed_bsz24_config.json
The program seems to freeze during or after loading the weights. These are the last couple lines before it freezes.
For the cifar example, first I tried to add the zero optimizer stage 1 to the config, but then I get an error message saying that I need fp16 enabled, so after I add
I then get an error message saying
So it seems that there's something in the code for these examples that are preventing either the zero optimizer or fp16 from working, I'm very curious about what exactly it is, so I can avoid the issue with my models.
Beta Was this translation helpful? Give feedback.
All reactions