Why "No code changes are needed" with zero-offload? What is the most basic principle here? #4342
Unanswered
chansonzhang
asked this question in
Q&A
Replies: 1 comment
-
In the most basic form zero-offload offloads the entire optimizer state. These are essentially tensors regardless of the optimizer (e.g., Adam) and independent of model structure. For more details, please see the paper: https://arxiv.org/abs/2101.06840. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
e.g., How does zero-offload know the model structure and which part of params/memory to offload?
Beta Was this translation helpful? Give feedback.
All reactions