I did get some memory issues with this still, but made a couple small adjustments and got it to run. Should it take 30 hours (same GPU as you)? I'm going to let it run because if the quality is good at least it'll be done, but if I do future ones, I'd love if I could get a quality LORA without taking 30 hours. It dropped to about 9 hours when I moved from bf16 to fp16 (now ~2.5s/it) |
@joetech @tornado73 Today I turn my Gist into an ROCm Docker REPO. |
Unfortunately at the current moment (9-24) this no longer works as the -r requirements.txt(line 35) breaks due to not bring a python project. |
git clone
cd kohya_ss
python -m venv venv
source venv/bin/activate
pip install torch torchvision --index-url
pip install --use-pep517 --upgrade -r requirements.txt
accelerate config
No distributed training
pip install tensorflow-rocm
sudo apt install python3-tk
source venv/bin/activate
python "$@"
settings Lora of person
Caption Extension - .txt
Optimizer - AdanW
LR Scheduler - constant
Network Rank (Dimension) 128
Network Alpha 128 or 64
Enable buckets - uncheck
Advanced Configuration
Use xformers - uncheck
Memory efficient attention -check
Max num workers for DataLoader- 1
start next time
python3 "$@"
with other optimizers - or not work or loss=nan
Good luck
p.s. installed by me
