Skip to content

[Feature] offload inference for Big Model parameters out of npu memory #2238

@Mr-Xiao2021

Description

@Mr-Xiao2021

When using MindNLP, if the model parameters cannot fully fit into the NPU memory, it seems there is currently no mechanism to offload parameters to the CPU or disk. This causes memory overflow issues when loading large models.

I would like MindNLP to support parameter offloading and on-demand loading — similar to the “device_map” and “offload_folder” features in Hugging Face Transformers — so that parts of the model can stay on CPU or disk and be dynamically moved to NPU during inference or training.

Manually splitting the model and transferring parameters between CPU and NPU layer by layer, but this approach is inefficient and difficult to manage for large-scale models.

Additional context
It would be helpful if MindNLP could provide an automatic or semi-automatic offload mechanism for large models that cannot fully fit into NPU memory, as shown in the highlighted code snippet below.

Image

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions