PeftModel.from_pretrained adapter not applied when using device="cuda:1"

When I load a base model with AutoModelForCausalLM and then apply a LoRA adapter using PeftModel.from_pretrained, the adapter works correctly on GPU0, but on GPU1 the model behaves like the original base model (no adapter effect).

No error is raised — inference runs normally, but outputs match the base model instead of the adapted model.

Here is my code：

```
        print("Using PEFT model for inference.")
        tokenizer = AutoTokenizer.from_pretrained(adapter_model_path, token=hf_token, trust_remote_code=True, use_fast=True)
        model = AutoModelForCausalLM.from_pretrained(model_name, token=hf_token, dtype=dtype, device_map=device)
        model.resize_token_embeddings(len(tokenizer))

        adapter_path = adapter_model_path + "/adapter_model"
        print(f"Loading adapter model from {adapter_path}")
        model = PeftModel.from_pretrained(model, adapter_path)
        model.eval()
```
**Environment**
> Thu Sep 18 09:02:23 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:27:00.0 Off |                    0 |
| N/A   62C    P0             292W / 300W |  67975MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off | 00000000:38:00.0 Off |                    0 |
| N/A   65C    P0             291W / 300W |  18560MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

**Expected behavior**
When specifying device="cuda:1", the adapter should be applied and inference results should match the adapted model (just like on GPU0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PeftModel.from_pretrained adapter not applied when using device="cuda:1" #2787

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PeftModel.from_pretrained adapter not applied when using device="cuda:1" #2787

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions