Replies: 1 comment 1 reply
-
@jisngprk I am also new to deep speed so I may be wrong but this is what works for many people:
As far as checkpoints go, I think one is the checkpoint, and another is the full model with state dic + optimizer state + other stuff. At least this is how it works with torch. For example: Save:
Load:
And if you want to use just the CPU, then you usually just specify it. I.e. model.cuda() Hope this is helpful. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have some question.
I am using two GPU in ond node
I want to load the model with cpu
How I could load the model ckpts with normal torch.load or deepspeed engine without distributed gpu environment setting?
If there is some example in DeepspeedExample repo, please let me know.
When I save the ckpt , the ckpt is saved in two separate directory that are named with loss. Is it normal that the ckpt is saved separately ? - I am guessing because of loss in name of directory
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions