You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes
Describe the bug
I'm trying the Automagic optimizer for a week, and I get this error (KeyError: 'lr_mask') when I restart a Flux LoRA training after a clean stop (ctrl-C).
#############################################
# Running job: MaisonClose_L02_AutoM_GAS1
#############################################
Running 1 process
Loading Flux model
Loading transformer
Quantizing transformer
Loading vae
Loading t5
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3470.67it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.30it/s]
Quantizing T5
Loading clip
making pipe
preparing
create LoRA network. base dim (rank): 24, alpha: 24
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder: 0 modules.
create LoRA for U-Net: 494 modules.
enable LoRA for U-Net
#### IMPORTANT RESUMING FROM output/MaisonClose_L02_AutoM_GAS1/MaisonClose_L02_AutoM_GAS1_000000500.safetensors ####
Loading from output/MaisonClose_L02_AutoM_GAS1/MaisonClose_L02_AutoM_GAS1_000000500.safetensors
Missing keys: []
Found step 500 in metadata, starting from there
Total training paramiters: 128,876,544
Loading optimizer state from output/MaisonClose_L02_AutoM_GAS1/optimizer.pt
Updating optimizer LR from params
Dataset: MaisonCloseSet02
- Preprocessing image dimensions
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 835/835 [00:43<00:00, 19.01it/s]
- Found 835 images
Bucket sizes for MaisonCloseSet02:
384x576: 835 files
1 buckets made
Dataset: MaisonCloseSet02
- Preprocessing image dimensions
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 835/835 [00:00<00:00, 103058.70it/s]
- Found 835 images
Bucket sizes for MaisonCloseSet02:
576x896: 835 files
1 buckets made
Dataset: MaisonCloseSet02
- Preprocessing image dimensions
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 835/835 [00:00<00:00, 114531.01it/s]
- Found 835 images
Bucket sizes for MaisonCloseSet02:
832x1216: 835 files
1 buckets made
MaisonClose_L02_AutoM_GAS1: 2%|█▋ | 500/30000 [00:00<?, ?it/s]Error running job: 'lr_mask'
========================================
Result:
- 0 completed jobs
- 1 failure
========================================
Traceback (most recent call last):
File "/workspace/apps/ai-toolkit0/run.py", line 90, in <module>
main()
File "/workspace/apps/ai-toolkit0/run.py", line 86, in main
raise e
File "/workspace/apps/ai-toolkit0/run.py", line 78, in main
job.run()
File "/mnt/d/TODAI/apps/ai-toolkit0/jobs/ExtensionJob.py", line 22, in run
process.run()
File "/mnt/d/TODAI/apps/ai-toolkit0/jobs/process/BaseSDTrainProcess.py", line 1826, in run
loss_dict = self.hook_train_loop(batch_list)
File "/mnt/d/TODAI/apps/ai-toolkit0/extensions_built_in/sd_trainer/SDTrainer.py", line 1647, in hook_train_loop
self.scaler.step(self.optimizer)
File "/mnt/d/TODAI/apps/ai-toolkit0/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 457, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File "/mnt/d/TODAI/apps/ai-toolkit0/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 352, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File "/mnt/d/TODAI/apps/ai-toolkit0/venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
return func.__get__(opt, opt.__class__)(*args, **kwargs)
File "/mnt/d/TODAI/apps/ai-toolkit0/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
out = func(*args, **kwargs)
File "/mnt/d/TODAI/apps/ai-toolkit0/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/mnt/d/TODAI/apps/ai-toolkit0/toolkit/optimizers/automagic.py", line 249, in step
lr_mask = state['lr_mask'].to(torch.float32)
KeyError: 'lr_mask'
The text was updated successfully, but these errors were encountered:
This is for bugs only
Did you already ask in the discord?
No
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes
Describe the bug
I'm trying the Automagic optimizer for a week, and I get this error (KeyError: 'lr_mask') when I restart a Flux LoRA training after a clean stop (ctrl-C).
the training parameters are:
The error is:
The text was updated successfully, but these errors were encountered: