You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This line currently forces the entire model to be put onto the GPU, even when using parameter-offloading on a single GPU. Is this on purpose? If so, how are we supposed to train 200B models, as announced in the blog post? Or am I missing something?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This line currently forces the entire model to be put onto the GPU, even when using parameter-offloading on a single GPU. Is this on purpose? If so, how are we supposed to train 200B models, as announced in the blog post? Or am I missing something?
Beta Was this translation helpful? Give feedback.
All reactions