WAN-video: support training with batch size > 1 #583

hnyu · 2025-05-22T04:15:18Z

Currently, if we want to set batch size > 1 for finetuning WAN, there will be dim0 size mismatch at

x = torch.cat([x, y], dim=1)  # (b, c_x + c_y, f, h, w)

in wan_video_dit.py. The reason is that the code is hardcoded to assume batch_size=1 when processing "context", "clip_feature" and "y" in training_step() of train_wan_t2v.py. We should actually use squeeze(1).

support training with batch size > 1

d6a879b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WAN-video: support training with batch size > 1 #583

WAN-video: support training with batch size > 1 #583

Uh oh!

hnyu commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

WAN-video: support training with batch size > 1 #583

Are you sure you want to change the base?

WAN-video: support training with batch size > 1 #583

Uh oh!

Conversation

hnyu commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hnyu commented May 22, 2025 •

edited

Loading