SD3 full finetuning LR question #487
Replies: 6 comments 14 replies
-
|
try
to comment out these lines |
Beta Was this translation helpful? Give feedback.
-
|
so the trainer for a full model utilises the BitFit technique for tuning, which freezes all the weights and tunes just the models' bias. i was wondering what it'd be like with SD3, and yes you can use a much higher LR and it will cook less. this is why I made it the default. however, just this morning I changed the default example configs so that this isn't applied out of the box, but left there commented-out as an example of how it might work to apply a setting conditionally. |
Beta Was this translation helpful? Give feedback.
-
|
see some experiments here: https://wandb.ai/bghira/sd3-training?nw=nwuserbghira but generally the full unfrozen weights & biases will cook no matter what LR you set, it's like it does nothing to the model at all, or suddenly it's bearing down like the boulder behind indiana jones and it picks up all the worst parts of the dataset and then fries itself. you can tell it's frying because it goes into square grid nonsense and then loses all depth, contrast, and prompt adherence (in that order) |
Beta Was this translation helpful? Give feedback.
-
|
In your tests, you haven't been using BITFIT? |
Beta Was this translation helpful? Give feedback.
-
So far in my tests, SimpleTuner can overfit on a single or a few graphs, but the learning effect starts to drop sharply on dozens or more graphs. Even in extreme training settings where overfitting is inevitable, it will never fit. |
Beta Was this translation helpful? Give feedback.
-
|
I've been experimenting with LoRA training on a larger dataset. Setting --max_grad_norm=0.01 seems to be beneficial for SD3 too in addition to PixArt. It allows training at higher learning rate for longer before cooking. Also setting --weighting_scheme=none has helped with anatomical cohesion (I modified helpers/arguments.py to allow this). This gets rid of logit-normal timestep sampling so that earlier and later timesteps get more training. The earlier timesteps should be responsible for anatomic features. Since the SD3 base model has been trained with logit-normal sampling, it may be the root cause of all the body horror people are seeing with SD3. I was able to reach about 30k steps with a learning rate of 2e-4 before the LoRA really started overcooking. I was training at batch size 1 just to get some quick results. I was also running with --lora_rank=128 --lora_alpha=128 but these could maybe be increased further. Maybe I'll also need to try full finetuning. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
I have been experimenting with different learning rates.
With LR 1e-6 it doesn't seem to make much of a change after 4k steps and a batch size ~25.
With LR 1e-5, batch size 27, and after 600 steps mild change.
Last run with LR 1e-4, batch size 27, after 1200 steps, the model seemed to improve with the style, but it is still not consistent and not very well learned. Even 1800 steps look like it's not enough.
LR 1e-4 for full finetune is a huge LR that would have nuked 1.5 and SDXL, does it make sense that even with such a huge LR, the model is learning so slowly? Is it possible that the LR is ignored, normalized, or automatically adjusted somehow?
multidatabackend.json:
[ { "id": "all_dataset", "type": "local", "instance_data_dir": "/workspace/input/dataset", "crop": false, "crop_style": "random", "crop_aspect": "preserve", "resolution": 0.5, "resolution_type": "area", "minimum_image_size": 0.1, "maximum_image_size": 1.0, "target_downsample_size":0.55, "prepend_instance_prompt": false, "instance_prompt": null, "only_instance_prompt": false, "caption_strategy": "textfile", "cache_dir_vae": "/workspace/cache_images/", "vae_cache_clear_each_epoch": false, "probability": 1.0, "repeats": 1, "text_embeds": "alt-embed-cache", "skip_file_discovery": "", "preserve_data_backend_cache": true }, { "id": "alt-embed-cache", "dataset_type": "text_embeds", "default": true, "type": "local", "cache_dir": "/workspace/cache_text_embeds/" } ]sdxl-env.sh:
Beta Was this translation helpful? Give feedback.
All reactions