[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device #21

JJJYmmm · 2024-01-31T18:12:45Z

Problems

When testing LLaVA-v1.5 with eval.py, the following error occurs.

*** RuntimeError: Expected all tensors to be on the same device, but found at least two devices, 
cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

This is because when using huggingface to load the model, the default parameter device_map="auto", the model will be loaded to multiple GPUs (Pipeline Parallelism).

def load_pretrained_model(model_path, model_base, model_name, \
load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs):
    ...

While in eval.py, the wrapped model(MLLM_Tester) will be called the cuda method, and the model parameters will be loaded to the default gpu again.

SEED-Bench/eval.py

Line 171 in fbc5f2c

model = build_model(args.model).cuda()

With the AlignDevicesHook conflict, the data is loaded to other gpus in some layer, and now all the parameters are on the default gpu, which triggers the error report.

Solution

I think removing .cuda() here is ok, though I only check the llava interface.

SEED-Bench/eval.py

Line 171 in fbc5f2c

model = build_model(args.model).cuda()

The text was updated successfully, but these errors were encountered:

fix AILab-CVC#21 (comment)

JJJYmmm added a commit to JJJYmmm/SEED-Bench that referenced this issue Jan 31, 2024

Update eval.py

41deae3

fix AILab-CVC#21 (comment)

JJJYmmm linked a pull request Jan 31, 2024 that will close this issue

Update eval.py #22

Open

JJJYmmm mentioned this issue Apr 16, 2024

Fix multiple GPU inference. haotian-liu/LLaVA#1057

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device #21

[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device #21

JJJYmmm commented Jan 31, 2024

[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device #21

[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device #21

Comments

JJJYmmm commented Jan 31, 2024

Problems

Solution