You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update the Wan Animate docs to reflect the most recent code
* Further explain input preprocessing and link to original Wan Animate preprocessing scripts
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/wan.md
+18-30Lines changed: 18 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -281,7 +281,7 @@ For replacement mode, you additionally need:
281
281
-**Mask video**: A mask indicating where to generate content (white) vs. preserve original (black)
282
282
283
283
> [!NOTE]
284
-
> The preprocessing tools are available in the original Wan-Animate repository. Integration of these preprocessing steps into Diffusers is planned for a future release.
284
+
> Raw videos should not be used for inputs such as `pose_video`, which the pipeline expects to be preprocessed to extract the proper information. Preprocessing scripts to prepare these inputs are available in the [original Wan-Animate repository](https://github.com/Wan-Video/Wan2.2?tab=readme-ov-file#1-preprocessing). Integration of these preprocessing steps into Diffusers is planned for a future release.
285
285
286
286
The example below demonstrates how to use the Wan-Animate pipeline:
287
287
@@ -293,13 +293,10 @@ import numpy as np
293
293
import torch
294
294
from diffusers import AutoencoderKLWan, WanAnimatePipeline
295
295
from diffusers.utils import export_to_video, load_image, load_video
-**mode**: Choose between `"animation"` (default) or `"replacement"`
462
-
-**num_frames_for_temporal_guidance**: Number of frames for temporal guidance (1 or 5 recommended). Using 5 provides better temporal consistency but requires more memory
463
-
-**guidance_scale**: Controls how closely the output follows the text prompt. Higher values (5-7) produce results more aligned with the prompt
464
-
-**num_frames**: Total number of frames to generate. Should be divisible by `vae_scale_factor_temporal` (default: 4)
450
+
-**mode**: Choose between `"animate"` (default) or `"replace"`
451
+
-**prev_segment_conditioning_frames**: Number of frames for temporal guidance (1 or 5 recommended). Using 5 provides better temporal consistency but requires more memory
452
+
-**guidance_scale**: Controls how closely the output follows the text prompt. Higher values (5-7) produce results more aligned with the prompt. For Wan-Animate, CFG is disabled by default (`guidance_scale=1.0`) but can be enabled to support negative prompts and finer control over facial expressions. (Note that CFG will only target the text prompt and face conditioning.)
0 commit comments