Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WanImageToVideoPipeline broken math when preparing latents #11163

Closed
vladmandic opened this issue Mar 27, 2025 · 1 comment · Fixed by #11167
Closed

WanImageToVideoPipeline broken math when preparing latents #11163

vladmandic opened this issue Mar 27, 2025 · 1 comment · Fixed by #11167
Labels
bug Something isn't working

Comments

@vladmandic
Copy link
Contributor

vladmandic commented Mar 27, 2025

Describe the bug

WAN 2.1 I2V models prepare_latents method has an issue when num_frames is not at default 81 frames.

Reproduction

Set width=832 height=480 num_frames=15

Logs

│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:611 in __call__                                                                                                                                                                                                                                                                                                             │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│   610 │   │   image = self.video_processor.preprocess(image, height=height, width=width).to(device, dtype=torch.float32)                                                                                                                                                                                                                                                                                                         │
│ ❱ 611 │   │   latents, condition = self.prepare_latents(                                                                                                                                                                                                                                                                                                                                                                         │
│   612 │   │   │   image,                                                                                                                                                                                                                                                                                                                                                                                                         │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:424 in prepare_latents                                                                                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ ❱ 424 │   │   mask_lat_size = mask_lat_size.view(batch_size, -1, self.vae_scale_factor_temporal, latent_height, latent_width)                                                                                                                                                                                                                                                                                                    │
│   425 │   │   mask_lat_size = mask_lat_size.transpose(1, 2)                                                                                                                                                                                                                                                                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[1, -1, 4, 60, 104]' is invalid for input of size 112320

System Info

diffusers==main

Who can help?

@DN6 @a-r-r-o-w @hlky

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Mar 28, 2025

Hi @vladmandic. num_frames must be of the form 4 * k + 1. So, if you try something like 17, it should work. This was mentioned in the docs here. I've opened a PR to raise an error if that's not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants