Replies: 6 comments 4 replies
-
|
you need --vae_cache_preprocess |
Beta Was this translation helpful? Give feedback.
-
|
I added |
Beta Was this translation helpful? Give feedback.
-
|
can you try with just one GPU? |
Beta Was this translation helpful? Give feedback.
-
|
for what it's worth, 40 seconds is a lot more than i'd expect. especially for a LoRA on a 2B model. it should be more like 3-5 seconds per second at worst, and 10 seconds per step when training from S3 backend edit: make sure you're not using DoRA. it's slower. |
Beta Was this translation helpful? Give feedback.
-
|
Tested with 1 GPU (A8000 with 48GB), with batch size 10 it still takes ~40 seconds each iteration. |
Beta Was this translation helpful? Give feedback.
-
|
I finally found out the problem: SimpleTuner has defaulted to use |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Thanks for this nice repo!
I have been trying to train a LoRA on SD3 using the multi-GPU setting. I am on 4 48GPUs (A6000), and my dataset is 1000 1024x1024 images --- I set up the batch size as 10 and do no gradient accumulation. Each iteration takes 40-50 seconds to complete.
It seems this training speed is dramatically slower comparing with many of the other logs I find in issues of this repo. Is it a normal phenomenon, or potentially I have some wrong setups?
Best regards and thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions