RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!


## ❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in `wan2.1` Image-to-Video Pipeline

### 🧩 Description

I'm encountering a `RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu` when trying to run the `wan2.1` image-to-video pipeline from `diffsynth_engine`.

### 🧪 Code

```python
import torch.multiprocessing as mp
from PIL import Image

from diffsynth_engine.pipelines import WanVideoPipeline, WanModelConfig
from diffsynth_engine.utils.download import fetch_model
from diffsynth_engine.utils.video import save_video

if __name__ == "__main__":
    mp.set_start_method("spawn")

    config = WanModelConfig(
        model_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors",
        t5_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors",
        vae_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors",
        image_encoder_path="/home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors",
        dit_fsdp=True,
    )

    pipe = WanVideoPipeline.from_pretrained(
        config,
        parallelism=1,
        use_cfg_parallel=True,
        offload_mode="sequential_cpu_offload",
    )

    image = Image.open("/home/fahmie/diffSynth_Engine/unnamed (3).jpg").convert("RGB")
    video = pipe(
        prompt="",
        negative_prompt="",
        input_image=image,
        num_frames=33,
        seed=42,
    )
    save_video(video, "wan_i2v.mp4", fps=15)

    del pipe
```

### 🧨 Details Error

```
/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
2025-05-05 06:26:31 - INFO - Flash attention 3 is not available
2025-05-05 06:26:31 - INFO - Flash attention 2 is not available
2025-05-05 06:26:31 - INFO - xFormers is available
2025-05-05 06:26:31 - INFO - Torch SDPA is available
2025-05-05 06:26:31 - INFO - Sage attention is not available
2025-05-05 06:26:31 - INFO - Sparge attention is not available
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors ...
2025-05-05 06:26:34 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors ...
Traceback (most recent call last):
  File "/home/fahmie/diffSynth_Engine/test.py", line 28, in <module>
    video = pipe(
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 339, in __call__
    prompt_emb_posi = self.encode_prompt(prompt)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 144, in encode_prompt
    prompt_emb = self.text_encoder(ids, mask)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 280, in forward
    x = block(x, mask, pos_bias=e)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 131, in forward
    e = pos_bias if self.shared_pos else self.pos_embedding(x.size(1), x.size(1))
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 154, in forward
    rel_pos_embeds = self.embedding(rel_pos)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
    return inner()
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1805, in inner
    result = forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
    return F.embedding(
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/utils/gguf.py", line 75, in gguf_embedding
    return origin_embedding(input, weight, *args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```

### 💻 System Info

```
OS: Ubuntu 20.04.6 LTS (x86_64)
Python: 3.10.16
PyTorch: 2.7.0+cu126
CUDA Build Version: 12.6
CUDA Runtime Version: 12.4.131
cuDNN: 9.5.1.17
Torchvision: 0.22.0
Triton: 3.3.0

CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (16 cores)
RAM: 62 GB
GPU: NVIDIA L4 (24 GB VRAM)
GPU Driver: 560.35.05
```

### 📝 Additional Notes

- I’m using `offload_mode="sequential_cpu_offload"` — could this be related?
- All `.safetensors` files were locally downloaded and seem to load fine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #47

❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in `wan2.1` Image-to-Video Pipeline

🧩 Description

🧪 Code

🧨 Details Error

💻 System Info

📝 Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #47

Description

❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in wan2.1 Image-to-Video Pipeline

🧩 Description

🧪 Code

🧨 Details Error

💻 System Info

📝 Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in `wan2.1` Image-to-Video Pipeline