Skip to content

Releases: modelscope/DiffSynth-Engine

v0.6.0: supports Wan2.2-S2V

09 Sep 07:20
ca8a9a5

Choose a tag to compare

Image Supports Wan2.2-S2V

Wan2.2-S2V is based on Wan 2.1, with several additional modules to inject audio, reference image and pose video conditions. Check the usage example here.

What's Changed

Full Changelog: v0.5.0...v0.6.0

v0.5.0: supports Qwen-Image-Edit

27 Aug 03:06
73e9179

Choose a tag to compare

Image Supports Qwen-Image-Edit

Qwen-Image-Edit is the image editing version of Qwen-Image, enabling semantic/appearance visual editing, and precise text editing. Check the usage example here.

What's Changed

New Contributors

Full Changelog: v0.4.1.post1...v0.5.0

v0.4.1: supports Qwen-Image

04 Aug 12:22
10bff22

Choose a tag to compare

Image Supports Qwen-Image

Qwen-Image is an image generation model excels at complex text rendering and creating images in a wide range of artistic styles. Check the usage example here.

System Requirements

Resource utilization for generating a 1024x1024 using Qwen-Image model with H20 GPU under different offload_mode:

Offload Mode Peak VRAM Usage (GB) Peak Memory Usage (GB) Inference Time (s)
None 62 64 57
"cpu_offload" 39 64 86
"sequential_cpu_offload" 8 64 134

What's Changed

Full Changelog: v0.4.0...v0.4.1.post1

v0.4.0: Supports Wan2.2

01 Aug 15:56
f1b70c2

Choose a tag to compare

Image Supports Wan2.2 video generation model

The WanVideoPipeline also supports Wan2.2 series model now. Taking the Wan2.2-TI2V-5B model as an example:

from diffsynth_engine import fetch_model, WanVideoPipeline, WanPipelineConfig
from diffsynth_engine.utils.video import save_video

config = WanPipelineConfig.basic_config(
    model_path=fetch_model(
        "Wan-AI/Wan2.2-TI2V-5B",
        revision="bf16",
        path=[
            "diffusion_pytorch_model-00001-of-00003-bf16.safetensors",
            "diffusion_pytorch_model-00002-of-00003-bf16.safetensors",
            "diffusion_pytorch_model-00003-of-00003-bf16.safetensors",
        ],
    ),
    parallelism=1,
    offload_mode=None,
)
pipe = WanVideoPipeline.from_pretrained(config)

image = Image.open("input/wan_i2v_input.jpg").convert("RGB")
video = pipe(
    prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.",
    negative_prompt="",
    input_image=image,
    num_frames=121,
    width=704,
    height=1280,
    seed=42,
)
save_video(video, "wan_ti2v.mp4", fps=pipe.config.fps)
  • Set parallelism to 2, 4 or 8 to speed up video generation with multiple GPUs.
  • By default, CPU offload is disabled. For lower VRAM usage, set offload_mode to "cpu_offload" (model-level offload) or "sequential_cpu_offload" (parameter-level offload, with lowest VRAM usage and maximum generation time).
  • The Wan2.2-TI2V-5B model supports generation with or without an input image.
  • The Wan2.2-TI2V-5B model generates video at 24 fps by default. To create a video of X seconds, please set num_frames to 24X+1.

Find more examples here for Wan2.2-T2V-A14B, Wan2.2-I2V-A14B.

⚠️[Breaking Change] Improved from_pretrained method pipeline initialization

In previous versions, we have from_pretrained method to initialize pipeline with a ModelConfig and other arguments. Such as,

from diffsynth_engine import fetch_model, FluxImagePipeline, FluxModelConfig

model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxModelConfig(dit_path=model_path, use_fp8_linear=True, use_fsdp=True)
pipe = FluxImagePipeline.from_pretrained(config, parallelism=8, use_cfg_parallel=True)

In the code example above, the division between ModelConfig and other arguments in from_pretrained method is not clear, which makes it quite confusing.

Since v0.4.0, we introduce a new PipelineConfig to contain all pipeline initialization arguments. With it, the above code can rewritten as:

from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig

model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxPipelineConfig(
    model_path=model_path,
    use_fp8_linear=True,
    parallelism=8,
    use_cfg_parallel=True,
    use_fsdp=True,
)
pipe = FluxImagePipeline.from_pretrained(config)

For beginners, we also provide a basic_config method with fewer arguments to make pipeline initialization easier:

from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig

model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
config = FluxPipelineConfig.basic_config(model_path=model_path, parallelism=8)
pipe = FluxImagePipeline.from_pretrained(config)

Check here for more available configs.

What's Changed

Full Changelog: v0.3.5...v0.4.0

v0.3.5

03 Jul 09:00
5c149ec

Choose a tag to compare

fix sdxl lora load

v0.3.4

03 Jul 03:08

Choose a tag to compare

fix controlnet offload

v0.3.3

27 Jun 03:17
68bd57c

Choose a tag to compare

Support Flux Diffusers LoRA

v0.3.2

19 Jun 09:08

Choose a tag to compare

bug fix for mps device

v0.3.1

19 Jun 02:39

Choose a tag to compare

bug fix

v0.3.0

18 Jun 07:26
c833b6e

Choose a tag to compare

fix offload mode (#82)