Releases: Sarania/blissful-tuner
0.11.66
Sorry I haven't made a release in a while but things should be in a pretty solid place, we're fully caught up with upstream and steadily adding new things. I updated the documentation to be more readable and reflect all the recent changes, so please have a look. Cheers!
0.8.66 Redux
Changes since last time include support for upcasting quantization and linear transformations, metadata viewer, CTRL+Q to early break for Wan/Hunyuan oneshot modes plus upstream changes and lots of bug fixes. Oh also Wan can load mixed precision transformers now(for how to create one and why you might wanna see: kohya-ss#232 (comment)). I did release an 0.8.66 before but I decided to pull it because I wanted to make upcasting optional instead of always and do additional testing so that's what was up with that. Cheers!
Full Changelog: 0.7.66...0.8.66
0.7.66
0.6.6
Tagged releases are created with the aim of being a more stable representation of the software. More testing has gone in to ensure there aren't any major bugs or issues compared to the raw repo. This is the very first tagged release. I've tested the main paths through the code, updated the documentation, and generally tried to tidy things up over my daily syncs because I've noticed people using my version and this pleases me. If you do find any issues, please let me know!
Side note: I develop in Python 3.12 and recommend that for this reason, but I do strive to maintain compatibility with 3.10 because that's what base Musubi uses and I'd like you to be able to use the same venv after installing requirements again!
The full list of changes since base Musubi is below(same as the readme at this point):
Extensions for all models:
- Latent preview during generation with either latent2RGB or TAEHV (
--preview_latent_every N
where N is a number of steps(or sections for framepack). By default uses latent2rgb, TAE can be enabled with--preview_vae /path/to/model
models: https://www.dropbox.com/scl/fi/fxkluga9uxu5x6xa94vky/taehv.7z?rlkey=ux1vmcg1yk78gv7iy4iqznpn7&st=4181tzkp&dl=0) - Optimized generation settings for fast, high quality gens (
--optimized
, enables various optimizations and settings based on the model. Requires SageAttention, Triton, PyTorch 2.7.0 or higher) - Save generation metadata in videos/images (automatic with
--container mkv
, disable with--no-metadata
, not available with--container mp4
) - Beautiful rich logging, rich argparse and rich tracebacks
- Extended saving options (
--codec codec --container container
, can save Apple ProRes(--codec prores
, super high bitrate perceptually lossless) into--container mkv
, or either ofh264
,h265
intomp4
ormkv
) - FP16 accumulation (
--fp16_accumulation
, works best with Wan FP16 models(but works with Hunyaun bf16 too!) and requires PyTorch 2.7.0 or higher but significantly accelerates inference speeds, especially with--compile
it's almost as fast as fp8_fast/mmscaled without the loss of precision! And it works with fp8 scaled mode too!) - GIMM-VFI framerate interpolation (
blissful_tuner/GIMMVFI.py
, please see it's--help
for usage. Models: https://www.dropbox.com/scl/fi/tcq68jxr52o2gi47eup37/gimm-vfi.7z?rlkey=skvzwxi9lv9455py5wrxv6r5j&st=gu5einkd&dl=0 ) - Upscaling with SwinIR or ESRGAN type models (
blissful_tuner/upscaler.py
, please see it's--help
for usage. Models: https://www.dropbox.com/scl/fi/wh5hw55o8rofg5mal9uek/upscale.7z?rlkey=oom3osa1zo0pf55092xcfnjp1&st=dozwpzwk&dl=0 )| - Use strings as your seed because why not! Also easier to remember!
- Use wildcards in your prompts for more variation! (
--prompt_wildcards /path/to/wildcard/directory
, for instance__color__
in your prompt would look for color.txt in that directory. The wildcard file format is one potential replacement string per line, with an optional relative weight attached like red:2.0 or "some longer string:0.5" - wildcards can also contain wildcards themselves, the recursion limit is 50 steps!)
Wan/Hunyuan extensions:
- Load diffusion-pipe style LoRAs for inference without converting first
- RifleX e.g. https://github.com/thu-ml/RIFLEx for longer vids (
--riflex_index N
where N is the RifleX frequency. 6 is good for Wan, can usually go to ~115 frames instead of just 81, requires--rope_func comfy
with Wan; 4 is good for Hunyuan and you can make at least double length!) - CFGZero* e.g. https://github.com/WeichenFan/CFG-Zero-star (
--cfgzerostar_scaling --cfgzerostar_init_steps N
where N is the total number of steps to 0 out at the start. 2 is good for T2V, 1 for I2V but it's better for T2V in my experience. Support for Hunyuan is HIGHLY experimental and only available with CFG enabled.) - Advanced CFG scheduling: (
--cfg_schedule
, please see the--help
for usage. Can specify guidance scale down to individual steps if you like!) - Perpendicular Negative Guidance (
--perp_neg neg_strength
, where neg_strength is a float that controls the string of the negative prompt. See--help
for more!)
Hunyuan only extensions:
- Several more LLM options (
--hidden_state_skip_layer N --apply_final_norm --reproduce
, please see the--help
for explanations!) - FP8 scaled support using the same algo as Wan (
--fp8_scaled
, Training isn't super tested!) - Separate prompt for CLIP (
--prompt_2 "second prompt goes here"
, provides a different prompt to CLIP since it's used to simpler text) - Rescale text encoders based on https://github.com/zer0int/ComfyUI-HunyuanVideo-Nyan (
--te_multiplier llm clip
such as--te_multiplier 0.9 1.2
to downweight the LLM slightly and upweight the CLIP slightly)
Wan only extensions(now supporting both one shot and interactive modes):
- V2V inferencing (
--video_path /path/to/input/video --v2v_denoise amount
where amount is a float 0.0 - 1.0 that controls how strong the noise added to the source video will be. If--v2v_noise_mode traditional
then it will run the last (amount * 100) percent of the timestep schedule like other implementations. If--v2v_noise_mode direct
it will directly control the amount of noise added as closely as possible by starting from wherever in the timestep schedule is closest to that value and proceeding from there. Supports scaling, padding, and truncation so the input doesn't have to be the same res as the output or even the same length! If--video_length
is shorter than the input, the input will be truncated and include only the first--video_length
frames. If--video_length
is longer than the input, the first frame or last frame will be repeated to pad the length depending on--v2v_pad_mode
. You can use either T2V or I2V--task
modes and models(i2v mode produces better quality in my opinion)! In I2V mode, if--image_path
is not specified, the first frame of the video will be used to condition the model instead.--infer_steps
should be the same amount it would for a full denoise e.g. by default 50 for T2V or 40 for I2V because we need to modify from a full schedule. Actual steps will depend on--v2v_noise_mode
) - Prompt weighting (
--prompt_weighting
and then in your prompt you can do like "a cat playing with a (large:1.4) red ball" to upweight the effect of "large". Note that [this] or (this) isn't supported, only (this:1.0) and also downweighting has curious effects - ROPE ported from ComfyUI that doesn't use complex numbers. Massive VRAM savings when used with
--compile
! (--rope_func comfy
) - Optional extra latent noise for I2V/V2V (
--v2v_extra_noise 0.02 --i2v_extra_noise 0.02
, values less than 0.04 are recommended. This can improve fine detail and texture in V2V/I2V but too much will cause artifacts and moving shadows. I use around 0.01-0.02 for V2V and 0.02-0.04 for I2V)
Framepack only extensions:
- Torch.compile (
--compile
, same syntax as Wan and Hunyuan already use) - FP8 fast/mm_scaled (
--fp8_fast
, increased speed on 40xx cards with a mild hit to quality, Wan and Hunyuan have this already in native Musubi!)