-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Chroma fix t5 #2203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sd3
Are you sure you want to change the base?
Chroma fix t5 #2203
Conversation
Co-authored-by: aider (deepseek/deepseek-chat) <[email protected]>
Co-authored-by: aider (deepseek/deepseek-chat) <[email protected]>
Co-authored-by: aider (deepseek/deepseek-chat) <[email protected]>
Co-authored-by: aider (deepseek/deepseek-chat) <[email protected]>
Co-authored-by: aider (deepseek/deepseek-chat) <[email protected]>
if self.is_schnell: | ||
t5xxl_max_token_length = 256 | ||
else: | ||
t5xxl_max_token_length = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is OK with Chroma or should keep 256 ?
It looks like you're getting an error because you're trying to train CLIP-L, which doesn't exist. Could you try training only DiT with Training T5 is not recommended, but you should be able to train DiT and T5 with |
I think it's not training text encoder. New version of args : PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True accelerate launch \
--mixed_precision bf16 \
--num_cpu_threads_per_process 1 \
/INTEL_SSD/github/fluxgym/sd-scripts/flux_train_network.py \
--model_type chroma \
--pretrained_model_name_or_path "models/unet/chroma_v10HD.safetensors" \
--t5xxl "models/clip/t5xxl_fp16.safetensors" \
--ae "models/vae/ae.sft" \
--apply_t5_attn_mask \
--cache_latents_to_disk \
--cache_text_encoder_outputs \
--cache_text_encoder_outputs_to_disk \
--dataset_config "outputs/neo_dataset.toml" \
--discrete_flow_shift 3.1582 \
--fp8_base \
--gradient_accumulation_steps 16 \
--gradient_checkpointing \
--guidance_scale 0.0 \
--highvram --mem_eff_attn \
--t5xxl_max_token_length 512 \
--learning_rate 7e-6 \
--lr_scheduler cosine_with_restarts --lr_scheduler_min_lr_ratio 1e-8 --lr_warmup_steps 5 --lr_scheduler_num_cycles 50 \
--loss_type l2 \
--max_data_loader_n_workers 2 \
--max_train_epochs 50 \
--mixed_precision bf16 \
--model_prediction_type raw \
--network_alpha 32 \
--network_dim 64 \
--network_module networks.lora_flux \
--optimizer_type adamw8bit \
--output_dir "outputs/neo_chroma_test3" \
--output_name chroma-lr7e-6 \
--sample_every_n_steps 15 \
--sample_every_n_epochs 1 \
--save_every_n_epochs 1 \
--sample_prompts "sample_prompts_chroma.txt" \
--save_every_n_steps 15 \
--save_last_n_steps 45 \
--save_state \
--save_last_n_steps_state 45 \
--save_state_on_train_end \
--network_train_unet_only \
--save_model_as safetensors \
--save_precision bf16 \
--xformers --persistent_data_loader_workers \
--seed 42 \
--network_args "ggpo_sigma=0.03" "ggpo_beta=0.01" "split_qkv=True" "target_module=models.layers" \
--timestep_sampling sigmoid \
--sample_at_first added Don't know if I am the only one having problem with original sd3 on Chroma HD. Here is my
|
I tested the latest command you shared and I get the error It seems that GGPO and split_qkv cannot be used at the same time, so try removing one of them. |
I have had a successful run using command provided with this branch. Didn't get any error. Sorry, can't help much, got barely above basic coding skills and help from LLMs to fix it on my end. Chroma training could be the next SDXL unless Qwen beats it to it, so more people will try fine tuning it by then. So you can put this on hold until more feedback is provided, don't want to make you waste time. |
Hi, I was getting an error when trying to train chroma :
Command :
Tried
SD3
branch andfix-chroma-training-withtout-te-cache
branch, both failed.So I asked help from deepseek and got at this point in my investigation :
Looking at the code, the issue is in the process_batch method in train_network.py at line 430. The problem is that Chroma doesn't use CLIP-L, but the code assumes both text encoders are present.
Here's the specific fix needed:
In train_network.py around line 430, modify the code to handle Chroma models differently:
However, this is a deeper architectural issue. The real problem is that the dataset preparation for Chroma needs to be handled differently. Looking at the code structure:
1 In flux_train_network.py, the FluxNetworkTrainer class needs to override the tokenization strategy for Chroma
2 The tokenization strategy should skip CLIP-L processing for Chroma models
3 The dataset should provide T5 tokens instead of CLIP tokens for Chroma
Better solution: Modify the get_tokenize_strategy method in FluxNetworkTrainer to handle Chroma differently:
You would also need to modify the FluxTokenizeStrategy to support use_clip_l=False for Chroma models, and ensure the dataset provides the correct tokenization outputs.
This is a fundamental architectural issue where Chroma support wasn't fully implemented in the training pipeline.
So I went up with it's help to fix all issues.
I just started a training run. Seems to work !
Also, the major difference from Flux Dev is that with Chroma I can batch size
4
instead of2
on 2x3090.VRAM is at 23.787
All images where cropped/scaled to 1024. Not sure it it makes a difference.
EDIT: Tried with bucketing and larger images : works! BUT old .npz must be deleted if they are from flux.
Not sure this code is up to standard or has not created other issues.
For now it trains, possible issues with quality due to code changes ?
Posting here to help others.