Skip to content

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #47

Open
@Fahmie23

Description

@Fahmie23

❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in wan2.1 Image-to-Video Pipeline

🧩 Description

I'm encountering a RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu when trying to run the wan2.1 image-to-video pipeline from diffsynth_engine.

🧪 Code

import torch.multiprocessing as mp
from PIL import Image

from diffsynth_engine.pipelines import WanVideoPipeline, WanModelConfig
from diffsynth_engine.utils.download import fetch_model
from diffsynth_engine.utils.video import save_video

if __name__ == "__main__":
    mp.set_start_method("spawn")

    config = WanModelConfig(
        model_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors",
        t5_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors",
        vae_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors",
        image_encoder_path="/home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors",
        dit_fsdp=True,
    )

    pipe = WanVideoPipeline.from_pretrained(
        config,
        parallelism=1,
        use_cfg_parallel=True,
        offload_mode="sequential_cpu_offload",
    )

    image = Image.open("/home/fahmie/diffSynth_Engine/unnamed (3).jpg").convert("RGB")
    video = pipe(
        prompt="",
        negative_prompt="",
        input_image=image,
        num_frames=33,
        seed=42,
    )
    save_video(video, "wan_i2v.mp4", fps=15)

    del pipe

🧨 Details Error

/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
2025-05-05 06:26:31 - INFO - Flash attention 3 is not available
2025-05-05 06:26:31 - INFO - Flash attention 2 is not available
2025-05-05 06:26:31 - INFO - xFormers is available
2025-05-05 06:26:31 - INFO - Torch SDPA is available
2025-05-05 06:26:31 - INFO - Sage attention is not available
2025-05-05 06:26:31 - INFO - Sparge attention is not available
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors ...
2025-05-05 06:26:34 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors ...
Traceback (most recent call last):
  File "/home/fahmie/diffSynth_Engine/test.py", line 28, in <module>
    video = pipe(
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 339, in __call__
    prompt_emb_posi = self.encode_prompt(prompt)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 144, in encode_prompt
    prompt_emb = self.text_encoder(ids, mask)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 280, in forward
    x = block(x, mask, pos_bias=e)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 131, in forward
    e = pos_bias if self.shared_pos else self.pos_embedding(x.size(1), x.size(1))
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 154, in forward
    rel_pos_embeds = self.embedding(rel_pos)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
    return inner()
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1805, in inner
    result = forward_call(*args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
    return F.embedding(
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/utils/gguf.py", line 75, in gguf_embedding
    return origin_embedding(input, weight, *args, **kwargs)
  File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

💻 System Info

OS: Ubuntu 20.04.6 LTS (x86_64)
Python: 3.10.16
PyTorch: 2.7.0+cu126
CUDA Build Version: 12.6
CUDA Runtime Version: 12.4.131
cuDNN: 9.5.1.17
Torchvision: 0.22.0
Triton: 3.3.0

CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (16 cores)
RAM: 62 GB
GPU: NVIDIA L4 (24 GB VRAM)
GPU Driver: 560.35.05

📝 Additional Notes

  • I’m using offload_mode="sequential_cpu_offload" — could this be related?
  • All .safetensors files were locally downloaded and seem to load fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions