Open
Description
❗ Bug Report: RuntimeError — Tensors on Different Devices (cpu vs cuda:0) in wan2.1
Image-to-Video Pipeline
🧩 Description
I'm encountering a RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu
when trying to run the wan2.1
image-to-video pipeline from diffsynth_engine
.
🧪 Code
import torch.multiprocessing as mp
from PIL import Image
from diffsynth_engine.pipelines import WanVideoPipeline, WanModelConfig
from diffsynth_engine.utils.download import fetch_model
from diffsynth_engine.utils.video import save_video
if __name__ == "__main__":
mp.set_start_method("spawn")
config = WanModelConfig(
model_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors",
t5_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors",
vae_path="/home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors",
image_encoder_path="/home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors",
dit_fsdp=True,
)
pipe = WanVideoPipeline.from_pretrained(
config,
parallelism=1,
use_cfg_parallel=True,
offload_mode="sequential_cpu_offload",
)
image = Image.open("/home/fahmie/diffSynth_Engine/unnamed (3).jpg").convert("RGB")
video = pipe(
prompt="",
negative_prompt="",
input_image=image,
num_frames=33,
seed=42,
)
save_video(video, "wan_i2v.mp4", fps=15)
del pipe
🧨 Details Error
/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
2025-05-05 06:26:31 - INFO - Flash attention 3 is not available
2025-05-05 06:26:31 - INFO - Flash attention 2 is not available
2025-05-05 06:26:31 - INFO - xFormers is available
2025-05-05 06:26:31 - INFO - Torch SDPA is available
2025-05-05 06:26:31 - INFO - Sage attention is not available
2025-05-05 06:26:31 - INFO - Sparge attention is not available
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-i2v-14b-480p-bf16/dit.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-umt5/umt5.safetensors ...
2025-05-05 06:26:33 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/wan2___1-vae/vae.safetensors ...
2025-05-05 06:26:34 - INFO - loading state dict from /home/fahmie/diffSynth_Engine/muse/open-clip-xlm-roberta-large-vit-huge-14/open-clip-xlm-roberta-large-vit-huge-14.safetensors ...
Traceback (most recent call last):
File "/home/fahmie/diffSynth_Engine/test.py", line 28, in <module>
video = pipe(
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 339, in __call__
prompt_emb_posi = self.encode_prompt(prompt)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/pipelines/wan_video.py", line 144, in encode_prompt
prompt_emb = self.text_encoder(ids, mask)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 280, in forward
x = block(x, mask, pos_bias=e)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 131, in forward
e = pos_bias if self.shared_pos else self.pos_embedding(x.size(1), x.size(1))
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/models/wan/wan_text_encoder.py", line 154, in forward
rel_pos_embeds = self.embedding(rel_pos)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
return inner()
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1805, in inner
result = forward_call(*args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
return F.embedding(
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/diffsynth_engine/utils/gguf.py", line 75, in gguf_embedding
return origin_embedding(input, weight, *args, **kwargs)
File "/home/fahmie/diffSynth_Engine/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
💻 System Info
OS: Ubuntu 20.04.6 LTS (x86_64)
Python: 3.10.16
PyTorch: 2.7.0+cu126
CUDA Build Version: 12.6
CUDA Runtime Version: 12.4.131
cuDNN: 9.5.1.17
Torchvision: 0.22.0
Triton: 3.3.0
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz (16 cores)
RAM: 62 GB
GPU: NVIDIA L4 (24 GB VRAM)
GPU Driver: 560.35.05
📝 Additional Notes
- I’m using
offload_mode="sequential_cpu_offload"
— could this be related? - All
.safetensors
files were locally downloaded and seem to load fine.
Metadata
Metadata
Assignees
Labels
No labels