Skip to content

[BUG/Help] <title>启动双卡推理失败,只能启动cpu推理,return torch.load(checkpoint_file, map_location="cpu") #1494

Open
@MentalBaka

Description

@MentalBaka

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

在linux中,我已经安装了cuda,同时torch.cuda.device_count返回值为2

当我启动api.py时,日志为return torch.load(checkpoint_file, map_location="cpu"),可以看到,并没有启用gpu推理
api.py代码如下
tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
model = model.eval()


但当我按照网上教程改为 load_model_on_gpus 加载模型时,发生报错
我的web_demo.py代码如下

from transformers import AutoModel, AutoTokenizer
import gradio as gr
import mdtex2html
import os
from utils import load_model_on_gpus

os.environ["CUDA_VISIBLE_DEVICES"]='0,1'
tokenizer = AutoTokenizer.from_pretrained("ChatGLM-6B", trust_remote_code=True)
model = load_model_on_gpus("ChatGLM-6B", num_gpus=2)
model = model.eval()


发生报错为
Traceback (most recent call last):
File "openai_api.py", line 173, in
model = load_model_on_gpus("ChatGLM-6B", num_gpus=2)
File "/usr/local/glm/ChatGLM-6B/utils.py", line 50, in load_model_on_gpus
model = dispatch_model(model, device_map=device_map)
File "/home/xxx/miniconda3/envs/glm/lib/python3.8/site-packages/accelerate/big_modeling.py", line 352, in dispatch_model
check_device_map(model, device_map)
File "/home/xxx/miniconda3/envs/glm/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 1420, in check_device_map
raise ValueError(
ValueError: The device_map provided does not give any device for the following parameters: transformer.embedding.word_embeddings.weight, …………

Expected Behavior

No response

Steps To Reproduce

OS : Ubuntu 20.04
cd /usr/local/glm
conda activate glm3
python api.py

api.py内容
tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
model = model.eval()

出现问题
return torch.load(checkpoint_file, map_location="cpu")

web_demo.py内容
from transformers import AutoModel, AutoTokenizer
import gradio as gr
import mdtex2html
import os
from utils import load_model_on_gpus

os.environ["CUDA_VISIBLE_DEVICES"]='0,1'
tokenizer = AutoTokenizer.from_pretrained("ChatGLM-6B", trust_remote_code=True)
model = load_model_on_gpus("ChatGLM-6B", num_gpus=2)
model = model.eval()

出现问题
Traceback (most recent call last):
File "openai_api.py", line 173, in
model = load_model_on_gpus("ChatGLM-6B", num_gpus=2)
File "/usr/local/glm/ChatGLM-6B/utils.py", line 50, in load_model_on_gpus
model = dispatch_model(model, device_map=device_map)
File "/home/xxx/miniconda3/envs/glm/lib/python3.8/site-packages/accelerate/big_modeling.py", line 352, in dispatch_model
check_device_map(model, device_map)
File "/home/xxx/miniconda3/envs/glm/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 1420, in check_device_map
raise ValueError(
ValueError: The device_map provided does not give any device for the following parameters: transformer.embedding.word_embeddings.weight, …………

Environment

- OS: Ubuntu 20.04
- Python:3.8
- Transformers:
- PyTorch:
- CUDA Support:True

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions