Releases: kohya-ss/musubi-tuner
Version 0.2.13
Highlights
- Support for Qwen-Image-Edit-2509: This release introduces support for the Qwen-Image-Edit-2509 model, enabling training and inference with multiple control images for more complex image editing tasks. (in #590)
- Reduced VRAM Usage for Block Swap: The shared VRAM usage for the block swap feature has been significantly reduced on Windows. (in #585)
- Dataset Handling Fix: A bug has been fixed where the first data item of each epoch was being handled incorrectly. This ensures that all data is processed properly throughout the epoch. (in #601)
What's Changed
- feat: add option to force 2.1 style time embedding in WanModel by @kohya-ss in #586
- doc: Wan update offloading instructions for DiT model on Windows by @kohya-ss in #597
- feat: Add flag to disable cuDNN PyTorch backend when caching by @xzuyn in #592
- Free VRAM for lazy loading in batch prompt mode by @JCBrouwer in #593
- doc: update dataset configuration for control images with mask by @kohya-ss in #600
- feat: Add support for Qwen-Image-Edit-2509 by @kohya-ss in #590
- fix: Qwen-Image-Edit (not 2509) incorrect prompt for VLM. by @kohya-ss in #606
- fix: Qwen-Image-Edit-2509 cannot handle arbitrary number of control images by @kohya-ss in #613
- Fix Prodigy optimizer logs by @wenyifancc in #623
- feat: Reducing shared VRAM usage for block swap by @kohya-ss in #585
- Organizing document structure by @kohya-ss in #630
- fix: first data of the epoch from dataset is inappropriate by @kohya-ss in #601
- chore: bump version to 0.2.13 by @kohya-ss in #631
New Contributors
- @JCBrouwer made their first contribution in #593
- @wenyifancc made their first contribution in #623
Full Changelog: v0.2.12...v0.2.13
Version 0.2.12
Highlights
📝 Code Quality: Ruff Formatting: PR #538
We introduced Ruff for unified linting and formatting across the codebase. This standardization improves contributor experience, ensures consistent style, and streamlines the review process. Thank you @arledesma for this great contribution!
⚡ CPU Offloading for Gradient Checkpointing: PR #537
Added support for activation CPU offloading during gradient checkpointing. This feature reduces VRAM usage by up to 20–30% in large-scale video training, enabling larger batch sizes. The trade-off is slightly slower training (a few percent to ~20%).
🚀 Faster Model Loading with MemoryEfficientSafeOpen: PR #556
Improved .safetensors
loading with np.memmap
and non-blocking GPU transfer, making model load times up to 1.5× faster. This significantly reduces waiting time for large model initialization.
🔬 FP8 Quantization with Block-wise Scaling: PR #575
Changed the --fp8_scaled
option from per-tensor quantization to block-wise scaling, resulting in improved accuracy and stability. For Qwen-Image LoRA training, this reduces VRAM usage by about 5GB. Training and inference speed may be slightly slower.
What's Changed
- Configure ruff format for code quality standardization by @arledesma in #538
- doc: update contributing guidelines for clarity and consistency by @kohya-ss in #541
- feat: add CPU offloading support for gradient checkpointing by @kohya-ss in #537
- docs: update README to include ruff code analysis and activation CPU offloading by @kohya-ss in #545
- bugfix: fix mask initialization for wan flf2v inference by @LittleNyima in #548
- feat: faster MemoryEfficientSafeOpen by @kohya-ss in #556
- from_file with qwen_image_generate_image is broken by @nmfisher in #553
- fix: Qwen-Image training not working with fp8_base by @kohya-ss in #559
- chore: add .vscode/settings.json to .gitignore by @kohya-ss in #562
- feat: faster LoRA merging and stable fp8 max calculation by @kohya-ss in #563
- fix(wan): FlashAttention-3 call error by @jimlee2048 in #570
- Fix qwen image from file generation by @kohya-ss in #557
- fix: add dtype handling for fp8 model weights in weight_hook_func by @kohya-ss in #582
- fix: VAE tiling is always enabled for FramePack by @kohya-ss in #583
- feat: FP8 quantization with block-wise scaling by @kohya-ss in #575
- chore: bump version to 0.2.12 by @kohya-ss in #584
New Contributors
- @LittleNyima made their first contribution in #548
- @nmfisher made their first contribution in #553
- @jimlee2048 made their first contribution in #570
Full Changelog: v0.2.11...v0.2.12
Version 0.2.11
What's Changed
- fix: correct metadata key for session ID in NetworkTrainer by @kohya-ss in #516
- Code Quality - F821 by @arledesma in #483
- Code Quality - Configure ruff linting with Exclusions by @arledesma in #488
- doc: README Add ruff code analysis introduction by @kohya-ss in #522
- feat: add QwenImageTrainer for fine-tuning with Adafactor optimizer by @kohya-ss in #492
- doc: split HunyuanVideo documentation from README by @kohya-ss in #525
- feat: Add REX learning rate scheduler by @xzuyn in #513
- Feat small refactoring for rex scheduler by @kohya-ss in #535
- Fix bugs when qwen-image use lycoris by @sdbds in #530
- bump version to v0.2.11 by @kohya-ss in #540
New Contributors
- @arledesma made their first contribution in #483
- @xzuyn made their first contribution in #513
Full Changelog: v0.2.10...v0.2.11
Version 0.2.10
What's Changed
- Minor usability improvements by @kohya-ss in #486
- Add sponsor logo by @kohya-ss in #487
- feat: Wan use original dtype in AttentionBlock to reduce memory usage by @kohya-ss in #493
- Feat add codex prompt by @kohya-ss in #507
- feat: support another LoRA format in convert_lora.py by @kohya-ss in #508
- feat: remove bitsandbytes version and update sentencepiece by @kohya-ss in #509
- feat: Fix support for schedulefree optimizer by @am7coffee in #505
- fix: streamline optimizer training mode by @kohya-ss in #510
- doc: add Schedule Free Optimizer support and update documentation by @kohya-ss in #511
New Contributors
- @am7coffee made their first contribution in #505
Full Changelog: v0.2.9...v0.2.10
v0.2.9
What's Changed
- doc: update documentation to clarify LoRA weight compatibility with ComfyUI for Wan by @kohya-ss in #465
- fix: update documentation to include network_module for Qwen-Image training by @kohya-ss in #467
- fix: load appropriate lora (high/low) for lazy loading by @kohya-ss in #479
- Update pyproject.toml by @SamTyurenkov in #472
- feat: Qwen-Image-Edit inference and training by @kohya-ss in #473
New Contributors
- @SamTyurenkov made their first contribution in #472
Full Changelog: v0.2.8...v0.2.9
v0.2.8
What's Changed
- fix: handle wandb initialization when wandb is not installed by @kohya-ss in #425
- Feat wan generate lazy loading by @kohya-ss in #427
- Add qwen_shift and qinglong_qwen by @sdbds in #428
- doc: update documents for qwen-image sampling method by @kohya-ss in #429
- fix: convert embed dtype to bfloat16 for QwenVL-2.5 fp8 caching by @kohya-ss in #440
- fix: fp8_base not working for Qwen-Image by @kohya-ss in #441
- Supports conversion of Qwen-Image LoRA to diffuser format by @kohya-ss in #444
- Fix error occurs in sampling image during the Qwen-Image training by @zhao-kun in #451
- fix: improve error handling for missing LoRA modules by @kohya-ss in #452
- feat: add support for timestep bucketing in training by @kohya-ss in #418
- fix: hv_generate_video requires a lot of VRAM by @kohya-ss in #459
- feat: Captioning with Qwen2.5-VL by @kohya-ss in #460
New Contributors
Full Changelog: v0.2.7...v0.2.8
v0.2.7
v0.2.6
v0.2.5
What's Changed
- Dynamic fp8 scaling and lora merging by @kohya-ss in #406
- fix: update attention mask handling for
--split_attn
by @kohya-ss in #413 - Fixed Wan2.1 model dtype casting with non-fp8 dtypes by @kohya-ss in #414
- Add new SNR function for better training by @sdbds in #407
- Add documentation for new sampling methods
logsnr
andqinglong
by @kohya-ss in #415 - feature: Wan2.2 14B support by @kohya-ss in #399
Full Changelog: v0.2.4...v0.2.5
v0.2.4
What's Changed
- Add prompt files for AI coding agents by @kohya-ss in #398
- Fix bugs about flux kontext part by @sdbds in #402
- fix: fix function name for offloading in FLUX1 Kontext by @kohya-ss in #403
- fix: add fp_latent_window_size to dataset schemas by @kohya-ss in #404
Full Changelog: v0.2.3...v0.2.4