Skip to content

Releases: kohya-ss/musubi-tuner

Version 0.2.13

05 Oct 07:20
fee0558

Choose a tag to compare

Version 0.2.13 Pre-release
Pre-release

Highlights

  • Support for Qwen-Image-Edit-2509: This release introduces support for the Qwen-Image-Edit-2509 model, enabling training and inference with multiple control images for more complex image editing tasks. (in #590)
  • Reduced VRAM Usage for Block Swap: The shared VRAM usage for the block swap feature has been significantly reduced on Windows. (in #585)
  • Dataset Handling Fix: A bug has been fixed where the first data item of each epoch was being handled incorrectly. This ensures that all data is processed properly throughout the epoch. (in #601)

What's Changed

  • feat: add option to force 2.1 style time embedding in WanModel by @kohya-ss in #586
  • doc: Wan update offloading instructions for DiT model on Windows by @kohya-ss in #597
  • feat: Add flag to disable cuDNN PyTorch backend when caching by @xzuyn in #592
  • Free VRAM for lazy loading in batch prompt mode by @JCBrouwer in #593
  • doc: update dataset configuration for control images with mask by @kohya-ss in #600
  • feat: Add support for Qwen-Image-Edit-2509 by @kohya-ss in #590
  • fix: Qwen-Image-Edit (not 2509) incorrect prompt for VLM. by @kohya-ss in #606
  • fix: Qwen-Image-Edit-2509 cannot handle arbitrary number of control images by @kohya-ss in #613
  • Fix Prodigy optimizer logs by @wenyifancc in #623
  • feat: Reducing shared VRAM usage for block swap by @kohya-ss in #585
  • Organizing document structure by @kohya-ss in #630
  • fix: first data of the epoch from dataset is inappropriate by @kohya-ss in #601
  • chore: bump version to 0.2.13 by @kohya-ss in #631

New Contributors

Full Changelog: v0.2.12...v0.2.13

Version 0.2.12

23 Sep 08:53
d3a9d85

Choose a tag to compare

Version 0.2.12 Pre-release
Pre-release

Highlights

📝 Code Quality: Ruff Formatting: PR #538

We introduced Ruff for unified linting and formatting across the codebase. This standardization improves contributor experience, ensures consistent style, and streamlines the review process. Thank you @arledesma for this great contribution!

⚡ CPU Offloading for Gradient Checkpointing: PR #537

Added support for activation CPU offloading during gradient checkpointing. This feature reduces VRAM usage by up to 20–30% in large-scale video training, enabling larger batch sizes. The trade-off is slightly slower training (a few percent to ~20%).

🚀 Faster Model Loading with MemoryEfficientSafeOpen: PR #556

Improved .safetensors loading with np.memmap and non-blocking GPU transfer, making model load times up to 1.5× faster. This significantly reduces waiting time for large model initialization.

🔬 FP8 Quantization with Block-wise Scaling: PR #575

Changed the --fp8_scaled option from per-tensor quantization to block-wise scaling, resulting in improved accuracy and stability. For Qwen-Image LoRA training, this reduces VRAM usage by about 5GB. Training and inference speed may be slightly slower.

What's Changed

  • Configure ruff format for code quality standardization by @arledesma in #538
  • doc: update contributing guidelines for clarity and consistency by @kohya-ss in #541
  • feat: add CPU offloading support for gradient checkpointing by @kohya-ss in #537
  • docs: update README to include ruff code analysis and activation CPU offloading by @kohya-ss in #545
  • bugfix: fix mask initialization for wan flf2v inference by @LittleNyima in #548
  • feat: faster MemoryEfficientSafeOpen by @kohya-ss in #556
  • from_file with qwen_image_generate_image is broken by @nmfisher in #553
  • fix: Qwen-Image training not working with fp8_base by @kohya-ss in #559
  • chore: add .vscode/settings.json to .gitignore by @kohya-ss in #562
  • feat: faster LoRA merging and stable fp8 max calculation by @kohya-ss in #563
  • fix(wan): FlashAttention-3 call error by @jimlee2048 in #570
  • Fix qwen image from file generation by @kohya-ss in #557
  • fix: add dtype handling for fp8 model weights in weight_hook_func by @kohya-ss in #582
  • fix: VAE tiling is always enabled for FramePack by @kohya-ss in #583
  • feat: FP8 quantization with block-wise scaling by @kohya-ss in #575
  • chore: bump version to 0.2.12 by @kohya-ss in #584

New Contributors

Full Changelog: v0.2.11...v0.2.12

Version 0.2.11

07 Sep 05:33
2c15f94

Choose a tag to compare

Version 0.2.11 Pre-release
Pre-release

What's Changed

  • fix: correct metadata key for session ID in NetworkTrainer by @kohya-ss in #516
  • Code Quality - F821 by @arledesma in #483
  • Code Quality - Configure ruff linting with Exclusions by @arledesma in #488
  • doc: README Add ruff code analysis introduction by @kohya-ss in #522
  • feat: add QwenImageTrainer for fine-tuning with Adafactor optimizer by @kohya-ss in #492
  • doc: split HunyuanVideo documentation from README by @kohya-ss in #525
  • feat: Add REX learning rate scheduler by @xzuyn in #513
  • Feat small refactoring for rex scheduler by @kohya-ss in #535
  • Fix bugs when qwen-image use lycoris by @sdbds in #530
  • bump version to v0.2.11 by @kohya-ss in #540

New Contributors

Full Changelog: v0.2.10...v0.2.11

Version 0.2.10

28 Aug 12:08

Choose a tag to compare

Version 0.2.10 Pre-release
Pre-release

What's Changed

  • Minor usability improvements by @kohya-ss in #486
  • Add sponsor logo by @kohya-ss in #487
  • feat: Wan use original dtype in AttentionBlock to reduce memory usage by @kohya-ss in #493
  • Feat add codex prompt by @kohya-ss in #507
  • feat: support another LoRA format in convert_lora.py by @kohya-ss in #508
  • feat: remove bitsandbytes version and update sentencepiece by @kohya-ss in #509
  • feat: Fix support for schedulefree optimizer by @am7coffee in #505
  • fix: streamline optimizer training mode by @kohya-ss in #510
  • doc: add Schedule Free Optimizer support and update documentation by @kohya-ss in #511

New Contributors

Full Changelog: v0.2.9...v0.2.10

v0.2.9

21 Aug 22:37
fe99c71

Choose a tag to compare

v0.2.9 Pre-release
Pre-release

What's Changed

  • doc: update documentation to clarify LoRA weight compatibility with ComfyUI for Wan by @kohya-ss in #465
  • fix: update documentation to include network_module for Qwen-Image training by @kohya-ss in #467
  • fix: load appropriate lora (high/low) for lazy loading by @kohya-ss in #479
  • Update pyproject.toml by @SamTyurenkov in #472
  • feat: Qwen-Image-Edit inference and training by @kohya-ss in #473

New Contributors

Full Changelog: v0.2.8...v0.2.9

v0.2.8

16 Aug 06:24
e7adb86

Choose a tag to compare

v0.2.8 Pre-release
Pre-release

What's Changed

  • fix: handle wandb initialization when wandb is not installed by @kohya-ss in #425
  • Feat wan generate lazy loading by @kohya-ss in #427
  • Add qwen_shift and qinglong_qwen by @sdbds in #428
  • doc: update documents for qwen-image sampling method by @kohya-ss in #429
  • fix: convert embed dtype to bfloat16 for QwenVL-2.5 fp8 caching by @kohya-ss in #440
  • fix: fp8_base not working for Qwen-Image by @kohya-ss in #441
  • Supports conversion of Qwen-Image LoRA to diffuser format by @kohya-ss in #444
  • Fix error occurs in sampling image during the Qwen-Image training by @zhao-kun in #451
  • fix: improve error handling for missing LoRA modules by @kohya-ss in #452
  • feat: add support for timestep bucketing in training by @kohya-ss in #418
  • fix: hv_generate_video requires a lot of VRAM by @kohya-ss in #459
  • feat: Captioning with Qwen2.5-VL by @kohya-ss in #460

New Contributors

Full Changelog: v0.2.7...v0.2.8

v0.2.7

10 Aug 06:55
9309bb9

Choose a tag to compare

v0.2.7 Pre-release
Pre-release

What's Changed

  • doc: update wan.md to include for --discrete_flow_shift in Wan2.2 models by @kohya-ss in #422
  • feature: Qwen-Image support by @kohya-ss in #408

Full Changelog: v0.2.6...v0.2.7

v0.2.6

09 Aug 07:17
c57e47c

Choose a tag to compare

v0.2.6 Pre-release
Pre-release

What's Changed

  • feat: Logging sample images and videos with wandb by @xhiroga in #420
  • doc: update README to include wandb logging feature and recent changes by @kohya-ss in #421

Full Changelog: v0.2.5...v0.2.6

v0.2.5

08 Aug 12:23
022f67f

Choose a tag to compare

v0.2.5 Pre-release
Pre-release

What's Changed

  • Dynamic fp8 scaling and lora merging by @kohya-ss in #406
  • fix: update attention mask handling for --split_attn by @kohya-ss in #413
  • Fixed Wan2.1 model dtype casting with non-fp8 dtypes by @kohya-ss in #414
  • Add new SNR function for better training by @sdbds in #407
  • Add documentation for new sampling methods logsnr and qinglong by @kohya-ss in #415
  • feature: Wan2.2 14B support by @kohya-ss in #399

Full Changelog: v0.2.4...v0.2.5

v0.2.4

02 Aug 11:38
891464c

Choose a tag to compare

v0.2.4 Pre-release
Pre-release

What's Changed

  • Add prompt files for AI coding agents by @kohya-ss in #398
  • Fix bugs about flux kontext part by @sdbds in #402
  • fix: fix function name for offloading in FLUX1 Kontext by @kohya-ss in #403
  • fix: add fp_latent_window_size to dataset schemas by @kohya-ss in #404

Full Changelog: v0.2.3...v0.2.4