Releases: bghira/SimpleTuner
v4.0.4 - ramtorch quality improvements, better LTX-2 audio-only training and video validations
What's Changed
- Z-Image example (non-turbo) should use model_flavour=base by @bghira in #2519
- ramtorch: percentage-based offload fix for text encoder moving to CPU and back inadvertently causing device mismatch error by @bghira in #2525
- (#2504) add --gradient_checkpointing_backend=unsloth, default to torch by @bghira in #2521
- bugfix: checkpoint preview page missing validation samples by @bghira in #2522
- [UI] add dataset configuration missing options; allow configuring audio duration for standalone sets by @bghira in #2526
- (#2510) allow mask conditioning_type to work on edit models that require latent conditioning by @bghira in #2520
- adamw_bf16 compatibility with unsloth checkpointing by @bghira in #2530
- unsloth checkpointing: flux2, hv, kv5, ltx2, wan, zim by @bghira in #2533
- ramtorch should disable quantisation and device moving by @bghira in #2532
- enable full ramtorch mode by default for the transformer so flux2 RMSNorm gets offloaded by @bghira in #2531
- fix error when validation is not None by @bghira in #2535
- ltx2: audio-only mode should skip video layers, TREAD, CREPA, and aim for ideal LoRA targets by @bghira in #2534
- ramtorch: fix Gemma3 output corruption by @bghira in #2538
- bypass validation scheduler setup for special models by @bghira in #2539
- torchao: fix int8 weight only quant via pipeline by @bghira in #2540
- add --ramtorch_disable_extensions and --ramtorch_disable_sync_hooks to disable custom features by @bghira in #2541
- prevent double-encoding of captions for audio auto-split dataset by @bghira in #2543
- ui: add audio options for video datasets on models which support a+v by @bghira in #2545
- ui: save gradient checkpointing option by default by @bghira in #2544
- (#2523) validation epoch interval should calculate starting point the same as global step by @bghira in #2546
- (#2524) ui: reduce severity of non-fatal errors by @bghira in #2547
- ramtorch: disable extensions by default for speedup on most systems by @bghira in #2548
- bug: kill complete process tree using psutil, same to how the Stop command works, during Shutdown by @bghira in #2550
- (#2542) add video preview to model card by @bghira in #2551
- LTX-2: allow custom schedules, since validation issue is resolved by @bghira in #2553
- z-image: remove low memory flag which now seems to not be needed by @bghira in #2552
- fix validation error for models like z-image that do not use dynamic shift by @bghira in #2555
- set timestep index to 0 to avoid lookup in scheduler.step by @bghira in #2554
- (#2529) remove LongCat specific phrasing on block swap desc by @bghira in #2556
- (#2527) move checkpointing disk section stuff to advanced subsection by @bghira in #2557
- merge by @bghira in #2549
Full Changelog: v4.0.3...v4.0.4
v4.0.3 - LTX-2 IC-LoRA, Z-Image base flavour, end_step/end_epoch dataset scheduling and GPU health checks
What's Changed
- (#2484) fix use of spread operator on ES6 object with getters by @bghira in #2486
- environment creation wizard size constraints for smaller (1920x1080) viewports under 4k by @bghira in #2487
- (#2479) add TEXT_JSON field type for complex data types in a simple text field input by @bghira in #2488
- use TEXT_JSON field type for TREAD by @bghira in #2489
- (#2480) adjust num_frames automatically by limit instead of throwing error by @bghira in #2490
- (#2475) bypass batch size for eval dataset by @bghira in #2491
- add max_num_samples per-dataset by @bghira in #2492
- (#2477) GPU circuit breaker by @bghira in #2493
- (#2474) surface processing statistics in webui; store count of too_small etc image count in dataset metadata files by @bghira in #2494
- (#2483) validation epoch tracking should simulate dataset scheduling by @bghira in #2495
- (#2274) add end_step / end_epoch scheduling for datasets by @bghira in #2496
- (#2470) multi-aspect input conditioning for kontext, flux2 and qwen edit by @bghira in #2497
- (#1812) i2v validation using image datasets and documentation updates by @bghira in #2499
- ss_tag_frequency should contain only terms in more than 50% of all captions by @bghira in #2500
- mkDocs: move to Zensical instead, and fix the theme by @bghira in #2501
- GPU circuit-breaker should treat thermal events as warning only, and display GPU thermal throttling in UI by @bghira in #2502
- avoid reusing stale job pid by canceling local running jobs at startup by @bghira in #2503
- LTX-2: IC-LoRA training with reference videos by @bghira in #2498
- z image (base) by @bghira in #2505
- (#2509) end-to-end JSON field handling fix for CLI launched training job by @bghira in #2511
- (#2507) eval dataset should have effective_batch_size of 1 by @bghira in #2512
- (#2508) calculate and sum all epoch stats as we receive them instead of incorrectly only counting the prev by @bghira in #2513
- face detection fixes for TrainingSample with PIL fallback by @bghira in #2515
- webui/webhooks: error reporting refactor by @bghira in #2516
- UI event system should rely on SSE manager by @bghira in #2517
- merge by @bghira in #2518
Full Changelog: v4.0.2...v4.0.3
v4.0.2 - audio-only LTX-2 training, HeartMuLa, CUDA 13 for Blackwell
What's Changed
- (#2435) lycoris example for klein 9b by @bghira in #2437
- enhanced ipc event emissions for Accelerate subprocess failures by @bghira in #2436
- refactor model foundation methods into mixin classes by @bghira in #2440
- cleanup some skipped tests, hidden errors by @bghira in #2441
- emit lifecycle event progress to webhooks for extracting captions by @bghira in #2444
- HeartMuLa reimplementation by @bghira in #2442
- show error message when crash occurs due to config parser by @bghira in #2445
- automatically override flow shift instead of erroring when auto flow shift is enabled by @bghira in #2446
- add cuda13 install target by @bghira in #2447
- add cuda13 install instructions to docs, and recommend python3.13 instead of 3.12 by @bghira in #2448
- support flux2 validation preview streaming by @bghira in #2449
- add allow_empty to some fields that need to be unsettable by @bghira in #2450
- store pid when starting job via start_training_job by @bghira in #2451
- twinflow: adversarial loss, doc updates by @bghira in #2453
- suggest agents to use python3.13 by @bghira in #2456
- better validation ux for attention mechanism selection by @bghira in #2455
- cuda-stable, cuda-nightly, and cuda13-stable, cuda13-nightly install targets by @bghira in #2457
- clarify edit vs reference dataset names in qwen edit quickstart by @bghira in #2458
- low disk space detection and script execution action by @bghira in #2460
- Support aws_session_token for S3 backends by @bghira in #2462
- LTX-2: audio-only training by @bghira in #2461
- qwen_image: fixes for TREAD by @bghira in #2459
- killing orphaned child processes by @bghira in #2463
- skip comfyui format conversion for models which support natively by @bghira in #2464
- add --ramtorch_transformer_percent and --ramtorch_text_encoder_percent to treat it more like block swap by @bghira in #2465
- structured error reporting by @bghira in #2466
- resume training directly from s3 storage by @bghira in #2468
- merge by @bghira in #2471
Full Changelog: v4.0.1...v4.0.2
v4.0.1 - klein, scheduled CREPA, and disable_multiline_split for captions with newlines
This release introduces flux2 klein 4b and 9b, a disable_multiline_split option for disabling multi-caption split on newlines; new options for customizing text encoder layers in FLUX.2 models, enhancements for model metadata, expanded validation strategies using datasets, and detailed CREPA regularization scheduling controls.
Data Loader Options:
- Added disable_multiline_split option to dataloader documentation in English (DATALOADER.md), Spanish (DATALOADER.es.md), Portuguese (DATALOADER.pt-BR.md), Hindi (DATALOADER.hi.md), Japanese (DATALOADER.ja.md), and Chinese (DATALOADER.zh.md). This option prevents splitting captions by newlines, useful for preserving intentional line breaks. Updated example configs to include this option. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
Model Training Options:
- Added --custom_text_encoder_intermediary_layers option to Spanish (OPTIONS.es.md) and Hindi (OPTIONS.hi.md) documentation, allowing users to override which hidden state layers are extracted from the text encoder for FLUX.2 models. Includes format, defaults, usage notes, and warnings about cache invalidation. [1] [2]
- Added --modelspec_comment option to Spanish (OPTIONS.es.md) documentation, enabling embedding custom comments into model metadata, visible in external viewers. Supports environment variable substitution and multiple lines. Updated CLI usage and options reference. [1] [2] [3]
Validation and Conditioning:
- Documented new validation strategies in Spanish (OPTIONS.es.md): --validation_using_datasets for img2img validation using training dataset images, and --eval_dataset_id for selecting a specific dataset for evaluation. Includes detailed explanations of conditioning modes, dataset types, and how these options interact.
CREPA Regularization Scheduling:
- Expanded documentation for CREPA regularization in Spanish (OPTIONS.es.md) with new options: --crepa_scheduler, --crepa_warmup_steps, --crepa_decay_steps, --crepa_lambda_end, --crepa_power, --crepa_cutoff_step, --crepa_similarity_threshold, --crepa_similarity_ema_decay, and --crepa_threshold_mode. Includes configuration examples and usage notes for advanced scheduling and stopping criteria. [1] [2]
v4.0.0 - multi-user, cloud training, webUI overhaul
SimpleTuner v4.0.0 Release Notes
Release Date: January 2026
This is a major release introducing enterprise-grade multi-user features, new model architectures, and significant infrastructure improvements. The diff comprises 354,291 lines across 1,199 files.
Highlights
- 2 New Model Architectures: LTX-Video 2 with audio generation and Wan S2V for speech-to-video
- Enterprise Multi-User Support with organizations, teams, RBAC, OIDC/LDAP SSO, and audit logging
- Job Queue System with priority scheduling, approval workflows, and quota management
- Remote Worker Orchestration for distributed GPU training
- 200+ New API Endpoints with comprehensive authentication
- Light Theme (Windows 98-inspired) and new admin UI
- Context Parallelism support across all transformer models
- 86 New Test Files with 1,000+ new test methods
Table of Contents
- Breaking Changes
- New Model Architectures
- Enterprise Features
- CLI Changes
- API Changes
- Training Improvements
- UI/UX Improvements
- Infrastructure Changes
- Test Coverage
- Migration Guide
Breaking Changes
CLI Entry Point
- Breaking: Main CLI entry point moved from
simpletuner.cli:maintost_cli:main - Update any scripts referencing the old module path
Docker Image
- Breaking: Base image upgraded to
nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04(was 12.4.1 on Ubuntu 22.04) - Breaking: Container now starts SimpleTuner server instead of
sleep infinity - Breaking: Working directory changed from
/workspaceto/app - New target architecture:
TORCH_CUDA_ARCH_LIST=8.9(Ada Lovelace) - SimpleTuner now installed from git
releasebranch instead of PyPI
API Authentication
- Breaking: All API endpoints now require authentication
- Previously open endpoints return
401 Unauthorizedwithout valid credentials - Use
/api/auth/loginfor session auth or API keys viaX-API-Keyheader
Documentation System
- Breaking: Migrated from Sphinx to MkDocs
- Documentation URL changed to
https://simpletuner.dev
New Model Architectures
LTX-Video 2 (LTX-2)
The first model in SimpleTuner with native audio-video generation.
- 19B Parameter Transformer (
LTX2VideoTransformer3DModel) - Audio Autoencoder (
AutoencoderKLLTX2Audio) for audio latent processing - Vocoder (
LTX2Vocoder) for mel-spectrogram to waveform conversion - Text Encoder: Gemma3 (12B) via
Gemma3ForConditionalGeneration - Latent Channels: 128
- Pipelines: Text-to-Video and Image-to-Video with audio
- Flavours:
dev,dev-fp4,dev-fp8,2.0 - Block Swap: Up to 47 swappable transformer blocks for memory optimization
Wan S2V (Speech-to-Video)
Generate video from audio, text, and reference images.
- 14B Parameter Model (
WanS2VTransformer3DModel) - Audio Encoding: Wav2Vec2 (facebook/wav2vec2-large-xlsr-53)
- Motion Encoder:
WanS2VMotionEncoderwith causal convolutions - VAE: AutoencoderKLWan (16 latent channels)
- Flavour:
s2v-14b-2.2
Context Parallelism Support
All transformers now include _cp_plan definitions for distributed training:
- ACE-Step, AuraFlow, Chroma, Cosmos, Flux, HiDream
- HunyuanVideo, Kandinsky5Video, LongCat-Image/Video
- LTXVideo, LTX-2, Lumina2, OmniGen, PixArt
- Sana, SanaVideo, SD3, Wan, Z-Image, Z-Image Omni
Enterprise Features
Multi-User Authentication
- Local Authentication: Username/password with secure session management
- OIDC Integration: Connect to external identity providers (Google, Okta, Auth0, etc.)
- LDAP/Active Directory: Enterprise directory integration
- API Keys: Scoped API keys for automation
Role-Based Access Control (RBAC)
- 4 Default Levels: Admin, Lead, Researcher, Viewer
- 17+ Granular Permissions:
admin.approve,admin.audit,admin.users, etc. - Resource Rules: GPU limits, job limits, cost caps using glob patterns
Organizations & Teams
- Hierarchical Structure: Organization → Teams → Users
- Quota Inheritance: Ceiling model with org → team → user quotas
- Member Roles: admin, lead, member per team
Job Queue System
- 5 Priority Levels: Critical, High, Normal, Low, Background
- Fair-Share Scheduling: Optional equal distribution across teams
- Configurable Concurrency: Global, per-user, per-team limits
- Starvation Prevention: Priority boosting for long-waiting jobs
Approval Workflows
- Rule-Based Requirements: Trigger approvals by cost threshold, hardware type, provider
- Request Lifecycle: Pending → Approved/Rejected → Expired
- Bulk Operations: Approve/reject multiple requests at once
- Email Response Integration: Approve via email reply
Quota Management
- Quota Types: Monthly/daily cost, concurrent jobs, jobs per hour/day, local GPUs
- Actions: Block, warn, or require approval when exceeded
- Real-Time Status: Usage tracking with 80% warning threshold
Audit Logging
- Tamper-Evident: Cryptographic hash chaining (HMAC-SHA256)
- Append-Only: Immutable audit trail
- Chain Verification: Detect tampering via integrity checks
- SIEM Integration: Export to Elasticsearch/Splunk via webhooks
- Event Types: Auth, user management, jobs, quotas, security events
Worker Orchestration
- Remote GPU Workers: Register workers via token authentication
- SSE Job Dispatch: Real-time job assignment streaming
- Heartbeat Monitoring: Automatic offline detection
- Orphan Recovery: Retry failed jobs when workers disconnect
Notification System
- Channels: Email (SMTP), Slack, Webhooks
- Event Routing: Per-user preferences by event type
- IMAP Response Handling: Email-based approval workflow
- Delivery History: Track notification delivery status
Circuit Breaker Resilience
- Per-Provider Breakers: Prevent cascading failures
- States: Closed → Open → Half-Open
- Configurable Thresholds: Failure count, timeout, success count
State Backend Options
Pluggable backends for multi-node deployments:
- Redis: Optimal for production (native async)
- PostgreSQL: Row-level locking with connection pooling
- MySQL: aiomysql support
- SQLite: WAL mode for single-node
- Memory: For testing/development
CLI Changes
New Commands
| Command | Description |
|---|---|
simpletuner jobs |
Job management (submit, list, cancel, retry, logs, approval) |
simpletuner quota |
Quota management (list, create, delete, status) |
simpletuner notifications |
Notification channels and preferences |
simpletuner backup |
Database backup and restore |
simpletuner database |
Database operations (health, verify, vacuum, migrate) |
simpletuner metrics |
Monitoring (prometheus, costs, usage, circuit breakers) |
simpletuner webhooks |
Webhook management (create, test, history) |
simpletuner worker |
Run as worker agent for orchestration |
simpletuner auth |
Authentication and user management |
simpletuner cloud |
Cloud training management |
simpletuner shutdown |
Graceful server shutdown |
Auth Subcommands
simpletuner auth setup # Bootstrap first admin
simpletuner auth users list # List users
simpletuner auth users create # Create user
simpletuner auth orgs list # List organizations
simpletuner auth orgs create # Create organization
simpletuner auth audit list # Query audit logs
simpletuner auth audit verify # Verify chain integrity
Server Enhancements
New flags for simpletuner server:
--host,--port: Bind configuration--ssl,--ssl-cert,--ssl-key: SSL support--reload: Development auto-reload--workers: Multi-process workers
Environment Variables
| Variable | Purpose |
|---|---|
SIMPLETUNER_SKIP_TORCH |
Fast CLI startup (skip torch imports) |
SIMPLETUNER_SSL_ENABLED |
Enable SSL |
SIMPLETUNER_API_KEY |
API key for authenticated requests |
SIMPLETUNER_ORCHESTRATOR_URL |
Worker orchestrator URL |
SIMPLETUNER_WORKER_TOKEN |
Worker authentication token |
API Changes
New Endpoint Categories
- Authentication:
/api/auth/*(login, logout, API keys, OIDC, LDAP) - Users:
/api/users/*(CRUD, levels, permissions, credentials) - Organizations:
/api/orgs/*(orgs, teams, members, quotas) - Approvals:
/api/approvals/*(rules, requests, bulk operations) - Queue:
/api/queue/*(submit, cancel, priority, stats) - Quotas:
/api/quotas/*(types, limits, usage) - Audit:
/api/audit/*(logs, stats, verification, export) - Metrics:
/api/metrics/*(prometheus, health, circuit breakers) - Backup:
/api/backup/*(create, restore, delete) - Database:
/api/database/*(health, migrations, vacuum) - Workers:
/api/workers/*and/api/admin/workers/* - Themes:
/api/themes/*(list, assets, CSS) - Webhooks:
/api/webhooks/*(test, progress)
Authentication Methods
- Session Auth: POST
/api/auth/login→ session cookie - API Key:
X-API-Keyheader - Worker Token:
X-Worker-Tokenheader (for workers only)
Statistics
- 200+ new endpoints added
- 179 endpoints now require
get_current_user - 217 endpoints use
require_permission(...)
Training Improvements
Memory Optimizations
- Lazy Optimizer Loading: Deferred imports for TorchAO, BitsAndBytes, Prodigy,...
v3.3.4
What's Changed
- ui: preserving changed value and formDirty states between tab changes by @bghira in #2252
- ui: remove annoying 2px layout shift by @bghira in #2253
- ui: mobile-friendly changes by @bghira in #2254
- ui: add webhook config builder by @bghira in #2256
- cog: stream logs via lightweight http listener by @bghira in #2257
- Implement frames slicing for CREPA video encoders by @kabachuha in #2258
- merge by @bghira in #2271
- Bump version from 3.3.3 to 3.3.4 by @bghira in #2273
Full Changelog: v3.3.3...v3.3.4
v3.3.3 - more memory optimisations
Features
- SDNQ quantisation engine for weights and optimisers
- Musubi block swap expanded to cover auraflow, chroma, longcat-image, lumina2, omnigen, hidream, sana, sd3, and z-image
- Kandinsky5 memory-efficient VAE now used instead of Diffusers' HunyuanVideo implementation (runs on consumer hw)
resolution_framesbucket strategy for video training so that multi-length dataset is possible with just a single config entry- WebUI: Training configuration wizard now allows filling in the number of checkpoints to keep
- metadata will be written to the model / LoRA checkpoint for ComfyUI LoRA Auto Trigger Words node to make use of
- OmniGen & Lumina2: TREAD, TwinFlow, and LayerSync
- Qwen Image: experimental tiled attention support that avoids OOM in attention calc (disabled, have to enter the code to enable it for now)
Bugfixes
- RamTorch
- Now applies to text encoders properly (incl CLIP)
- Extended to support Conv2D and Embedding layers (eg. SDXL offload)
- Compatibility with Quanto (tested with int2, int4, int8-quanto)
- System memory use reduction by not calculating gradients when
requires_grad=False
- Text encoder memory not unloading fixed for Qwen Image
- No more quantize_via pipeline error when no quantisation is enabled
- Qwen Image batch size > 1 training fixed (padded)
- ROCm: bypass PyTorch bug for building kernels, enabling full Quanto compatibility (int2, int4, int8, fp8)
What's Changed
- add metadata for ComfyUI-Lora-Auto-Trigger-Words node by @bghira in #2222
- auraflow: implement musubi block swap by @bghira in #2227
- chroma: implement musubi block swap by @bghira in #2228
- longcat image: implement musubi block swap by @bghira in #2230
- modernise lumina2 implementation with TREAD, block swapping, twinflow and layersync by @bghira in #2231
- modernise omnigen implementation with TREAD, block swapping, twinflow and layersync by @bghira in #2232
- pixart: implement musubi block swap by @bghira in #2233
- add qwen-edit-2511 support, and an edit-v2+ flavour which enables 2511 features on 2509 by @bghira in #2223
- hidream: implement musubi block swap by @bghira in #2234
- sana & sanavideo: implement musubi block swap by @bghira in #2235
- sd3: implement musubi block swap by @bghira in #2236
- z-image turbo & omni: implement musubi block swap by @bghira in #2237
- use kandinsky5 optimised VAE with added temporal roll and chunked conv3d by @bghira in #2229
- when preparing model with offload enabled, do not move to accelerator by @bghira in #2238
- docs: document SIMPLETUNER_JOB_ID env var for webhook job_id by @rafstahelin in #2239
- sdnq quant engine by @bghira in #2225
- fix error str vs int comparison by @bghira in #2241
- fix error when quantize_via=pipeline but no_change level was provided by @bghira in #2242
- ramtorch: when using it for text encoders, do not move to gpu by @bghira in #2244
- add resolution_frames bucket strategy for video datasets so that different lengths can exist in one dataset by @bghira in #2240
- add checkpoints total limit to wizard by @bghira in #2243
- qwen image: fix padding for text embeds by @bghira in #2246
- quanto: fix ROCm compiler error for int2-quanto; fix for RamTorch compatibility by @bghira in #2248
- qwen image: tiled attention fallback when we hit OOM by @bghira in #2249
- ramtorch: fix for gradient memory ballooning; fix text encoder application; extend for Conv2D and Embedding offload by @bghira in #2250
- merge by @bghira in #2251
New Contributors
- @rafstahelin made their first contribution in #2239
Full Changelog: v3.3.2...v3.3.3
v3.3.2 - easily optimise memory consumption
Features
- Better diffusion loss tracking when using LayerSync + CREPA
- WebUI easy memory optimisation config for light/medium/aggressive configs
- TUI
simpletuner configurealso able to apply optimisation presets to existing configs
Bugfixes
- ComfyUI will now automatically enable v-prediction and ztsnr for relevant checkpoints
- LongCat batched training now works correctly
- LongCat edit fixed
- ControlNet demo dataset repeats boosted
- Chroma indent issue fixed, now trains again
- Example configs fixed, populate in UI correctly
- Example configs no longer use constant LR scheduler with warmup steps incorrectly
- SDXL hidden state buffer arg removed
- TinyGemm device mismatch
- Examples no longer suggest
validation_torch_compileor lion optimiser for video models (degrades)
What's Changed
- add pure diffusion loss term pre-augmentation when aux loss is enabled by @bghira in #2201
- switch video training example configs from Lion to AdamW BF16 by @bghira in #2206
- remove validation torch compile option from examples by @bghira in #2207
- (#2175) move scale_shift to _data device by @bghira in #2202
- when example uses lr warmup, use constant_with_warmup by @bghira in #2208
- Fixup crepa states extraction for K5 by @kabachuha in #2209
- fix: remove unsupported hidden_states_buffer from SDXL model_predict by @joeqzzuo in #2213
- fix config syntax by @bghira in #2214
- (#2211) fix Chroma indent issue and resolve validation and training noise by @bghira in #2215
- use repeats of 4 by default on demo CN datasets by @bghira in #2218
- add lycoris example for longcat edit by @bghira in #2217
- longcat image: fix text encoder padding on inputs and initialisation of text processor by @bghira in #2216
- (#1822) add --delete_model_after_load to remove files from disk after they're loaded into memory by @bghira in #2210
- comfyui: ztsnr and vpred compatibility by @bghira in #2220
- easy memory optimisation presets by @bghira in #2221
- merge by @bghira in #2219
New Contributors
- @kabachuha made their first contribution in #2209
- @joeqzzuo made their first contribution in #2213
Full Changelog: v3.3.1...v3.3.2
v3.3.1
What's Changed
- flux2: do not bypass the special model loader by @bghira in #2170
- (#2030) scheduled dataset sampling by @bghira in #2167
- GLANCE: better code example by @bghira in #2171
- TwinFlow: do not initialise neg time embed when disabled by @bghira in #2174
- UI (datasets): remove ControlNet conditioning option from selections when CN is disabled; select reference_strict by default otherwise by @bghira in #2177
- add missing LayerSync support to kandinsky5 video by @bghira in #2179
- qwen-edit: fix text embed cache generation with image context; disable image embeddings for multi-conditioning input by @bghira in #2176
- chroma 4d text embed fix by @bghira in #2181
- ensure edit-v2 either uses 1:1 or 0 image embeds by @bghira in #2186
- upload zip: preserve subdirs by @bghira in #2189
- allow
simpletuner server env=...to auto-start training after webUI launches by @bghira in #2191 - add more indicators to dataset page when conditioning parameters are not set by @bghira in #2192
- Git-based configuration sync across SimpleTuner nodes (wip) by @bghira in #2172
- Z-Image-Omni with optional SigLIP conditioning support, TREAD, LayerSync, CFG layer skip, fp16 clamping, and TwinFlow by @bghira in #2183
- (#2182) add --peft_lora_target_modules for arbitrary layer definition by @bghira in #2193
- (#2190) add webUI onboarding config to "simpletuner configure" by @bghira in #2194
- merge by @bghira in #2196
- (#2173) remove early check for CREPA since we are using LayerSync features with certain configs by @bghira in #2195
- (#2187) better image resizing for validation inputs when validation resolution != training resolution by @bghira in #2197
- adjust default resolution on dataset page to equal --resolution, and ensure min/max/target down sample size are equal by @bghira in #2198
- merge by @bghira in #2199
Full Changelog: v3.3.0...v3.3.1
v3.3.0 - TwinFlow, LayerSync, and Flux.2 edit training
Features
- TwinFlow, a distillation method that works on most flow-matching arch and converges in much less time than typical distillation
- LayerSync, a self-regularisation method for practically all transformer models supported in SimpleTuner
- CREPA can combine forces with LayerSync to self-regulate instead of using DINO features
- Flux.2 can now accept conditioning datasets
- Custom flow-matching timesteps can be provided for training, allowing configuration of "Glance" style training runs
- WebUI: better path handling for datasets, sensible defaults will be set instead of requiring the user to figure it out
- CLI: When configuring dataset cache directories, you can now use
{id},{output_dir}in addition to{model_family}to make dynamic paths that adjust automatically based on these attributes
Bugfixes
- WebUI: Search box race condition resolved that prevented items from highlighting, or subsections from expanding
What's Changed
- TwinFlow self-directed distillation by @bghira in #2159
- (#2136) add --flow_custom_timesteps with Glance "distillation" example by @bghira in #2160
- flux2: adjust comfyUI lora export format to use their custom keys instead of generic LoRA layout by @bghira in #2162
- [webUI] refactoring validation and default paths for text embed and VAE caches by @bghira in #2163
- flux2: support conditioning datasets by @bghira in #2164
- fix search box race condition that prevented expanding subsection or highlighting results by @bghira in #2165
- LayerSync + CREPA adaptation by @bghira in #2161
- merge by @bghira in #2166
Full Changelog: v3.2.3...v3.3.0