How do you avoid moving the entire model onto the GPU with parameter-offloading? #1341

ClashLuke · 2021-09-02T04:06:36Z

ClashLuke
Sep 2, 2021

This line currently forces the entire model to be put onto the GPU, even when using parameter-offloading on a single GPU. Is this on purpose? If so, how are we supposed to train 200B models, as announced in the blog post? Or am I missing something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you avoid moving the entire model onto the GPU with parameter-offloading? #1341

{{title}}

Replies: 0 comments

Select a reply

How do you avoid moving the entire model onto the GPU with parameter-offloading? #1341

ClashLuke Sep 2, 2021

Replies: 0 comments

ClashLuke
Sep 2, 2021