When using MindNLP, if the model parameters cannot fully fit into the NPU memory, it seems there is currently no mechanism to offload parameters to the CPU or disk. This causes memory overflow issues when loading large models.
I would like MindNLP to support parameter offloading and on-demand loading — similar to the “device_map” and “offload_folder” features in Hugging Face Transformers — so that parts of the model can stay on CPU or disk and be dynamically moved to NPU during inference or training.
Manually splitting the model and transferring parameters between CPU and NPU layer by layer, but this approach is inefficient and difficult to manage for large-scale models.
Additional context
It would be helpful if MindNLP could provide an automatic or semi-automatic offload mechanism for large models that cannot fully fit into NPU memory, as shown in the highlighted code snippet below.
