You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards
163
164
"""
164
165
@@ -249,6 +250,220 @@ The code snippets available in [this](https://github.com/huggingface/diffusers/p
249
250
250
251
The general rule of thumb to keep in mind when preparing inputs for the VACE pipeline is that the input images, or frames of a video that you want to use for conditioning, should have a corresponding mask that is black in color. The black mask signifies that the model will not generate new content for that area, and only use those parts for conditioning the generation process. For parts/frames that should be generated by the model, the mask should be white in color.
251
252
253
+
</hfoption>
254
+
</hfoptions>
255
+
256
+
### Wan-Animate: Unified Character Animation and Replacement with Holistic Replication
257
+
258
+
[Wan-Animate](https://huggingface.co/papers/2509.14055) by the Wan Team.
259
+
260
+
*We introduce Wan-Animate, a unified framework for character animation and replacement. Given a character image and a reference video, Wan-Animate can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos. Alternatively, it can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone to achieve seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ a modified input paradigm to differentiate between reference conditions and regions for generation. This design unifies multiple tasks into a common symbolic representation. We use spatially-aligned skeleton signals to replicate body motion and implicit facial features extracted from source images to reenact expressions, enabling the generation of character videos with high controllability and expressiveness. Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module preserves the character's appearance consistency while applying the appropriate environmental lighting and color tone. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and its source code.*
261
+
262
+
The project page: https://humanaigc.github.io/wan-animate
263
+
264
+
This model was mostly contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz).
265
+
266
+
#### Usage
267
+
268
+
The Wan-Animate pipeline supports two modes of operation:
269
+
270
+
1.**Animation Mode** (default): Animates a character image based on motion and expression from reference videos
271
+
2.**Replacement Mode**: Replaces a character in a background video with a new character while preserving the scene
272
+
273
+
##### Prerequisites
274
+
275
+
Before using the pipeline, you need to preprocess your reference video to extract:
276
+
-**Pose video**: Contains skeletal keypoints representing body motion
277
+
-**Face video**: Contains facial feature representations for expression control
278
+
279
+
For replacement mode, you additionally need:
280
+
-**Background video**: The original video containing the scene
281
+
-**Mask video**: A mask indicating where to generate content (white) vs. preserve original (black)
282
+
283
+
> [!NOTE]
284
+
> The preprocessing tools are available in the original Wan-Animate repository. Integration of these preprocessing steps into Diffusers is planned for a future release.
285
+
286
+
The example below demonstrates how to use the Wan-Animate pipeline:
287
+
288
+
<hfoptionsid="Animate usage">
289
+
<hfoptionid="Animation mode">
290
+
291
+
```python
292
+
import numpy as np
293
+
import torch
294
+
from diffusers import AutoencoderKLWan, WanAnimatePipeline
295
+
from diffusers.utils import export_to_video, load_image, load_video
-**mode**: Choose between `"animation"` (default) or `"replacement"`
462
+
-**num_frames_for_temporal_guidance**: Number of frames for temporal guidance (1 or 5 recommended). Using 5 provides better temporal consistency but requires more memory
463
+
-**guidance_scale**: Controls how closely the output follows the text prompt. Higher values (5-7) produce results more aligned with the prompt
464
+
-**num_frames**: Total number of frames to generate. Should be divisible by `vae_scale_factor_temporal` (default: 4)
465
+
466
+
252
467
## Notes
253
468
254
469
- Wan2.1 supports LoRAs with [`~loaders.WanLoraLoaderMixin.load_lora_weights`].
@@ -281,10 +496,10 @@ The general rule of thumb to keep in mind when preparing inputs for the VACE pip
281
496
282
497
# use "steamboat willie style" to trigger the LoRA
283
498
prompt ="""
284
-
steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot,
285
-
revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
286
-
for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
287
-
Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
499
+
steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot,
500
+
revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
501
+
for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
502
+
Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
288
503
shadows and warm highlights. Medium composition, front view, low angle, with depth of field.
289
504
"""
290
505
@@ -359,6 +574,12 @@ The general rule of thumb to keep in mind when preparing inputs for the VACE pip
Copy file name to clipboardExpand all lines: examples/community/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5488,7 +5488,7 @@ Editing at Scale", many thanks to their contribution!
5488
5488
5489
5489
This implementation of Flux Kontext allows users to pass multiple reference images. Each image is encoded separately, and the resulting latent vectors are concatenated.
5490
5490
5491
-
As explained in Section 3 of [the paper](https://arxiv.org/pdf/2506.15742), the model's sequence concatenation mechanism can extend its capabilities to handle multiple reference images. However, note that the current version of Flux Kontext was not trained for this use case. In practice, stacking along the first axis does not yield correct results, while stacking along the other two axes appears to work.
5491
+
As explained in Section 3 of [the paper](https://huggingface.co/papers/2506.15742), the model's sequence concatenation mechanism can extend its capabilities to handle multiple reference images. However, note that the current version of Flux Kontext was not trained for this use case. In practice, stacking along the first axis does not yield correct results, while stacking along the other two axes appears to work.
0 commit comments