Training error (Updated) #468

ilikespace1 · 2023-03-28T04:04:29Z

ilikespace1
Mar 28, 2023

Update: I updated some more files and I'm back to getting the "ValueError: fp16 mixed precision requires a GPU" error. I tried training without any mixed precision and "float" as my save precision and I get this error (full error below):

File "C:\AI\kohya_ss\venv\lib\site-packages\xformers\ops.py", line 726, in op
raise NotImplementedError(f"No operator found for this attention: {self}")
NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cpu'), k=40, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=7680, q_len=7680, kv=40, batch_size=2, num_heads=8)
steps: 0%| | 0/1600 [00:06<?, ?it/s]

Is it having trouble detecting and using my hardware? I can run Stable Diffusion just fine.

Things I've tried:

Reinstalling GPU drivers (DDU uninstall)
Manually updating files using kohya's sd-scripts repo
Using AdamW as optimizer
Rerunning accelerate config and choosing "all", "0", and "1" as my GPU id's (individually)
Deleting/Reinstalling kohya_ss

So to reiterate, I currently have the "ValueError: fp16 mixed precision requires a GPU" error when I try using fp16 as my mixed and saved precision. This is the full error that I get when I try using no mixed precision and "float" as my save precision:

Folder 100_AW: 32 images found
Folder 100_AW: 3200 steps
max_train_steps = 1600
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="C:/AI/Process lora/AW/image" --resolution=768,768 --output_dir="C:/AI/Process lora/AW/model" --logging_dir="C:/AI/Process lora/AW/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="AW Model" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1600" --save_every_n_epochs="1" --mixed_precision="no" --save_precision="float" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory C:\AI\Process lora\AW\image\100_AW contains 32 image files
3200 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (768, 768)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True

[Subset 0 of Dataset 0]
image_dir: "C:\AI\Process lora\AW\image\100_AW"
image_count: 32
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: AW
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|█████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 197.35it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (384, 384), count: 100
bucket 1: resolution (576, 832), count: 100
bucket 2: resolution (576, 896), count: 100
bucket 3: resolution (576, 1024), count: 600
bucket 4: resolution (640, 832), count: 400
bucket 5: resolution (640, 896), count: 100
bucket 6: resolution (704, 704), count: 500
bucket 7: resolution (704, 768), count: 100
bucket 8: resolution (768, 768), count: 100
bucket 9: resolution (832, 640), count: 100
bucket 10: resolution (896, 512), count: 100
bucket 11: resolution (960, 512), count: 800
bucket 12: resolution (960, 576), count: 100
mean ar error (without repeats): 0.03709647043484138
prepare accelerator
Using accelerator 0.15.0 or above.
load Diffusers pretrained models
safety_checker\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
C:\AI\kohya_ss\venv\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 32/32 [07:03<00:00, 13.24s/it]
import network module: networks.lora
create LoRA network. base dim (rank): 128, alpha: 128.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 3200
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 1600
num epochs / epoch数: 1
batch size per device / バッチサイズ: 2
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1600
steps: 0%| | 0/1600 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
File "C:\AI\kohya_ss\train_network.py", line 699, in
train(args)
File "C:\AI\kohya_ss\train_network.py", line 538, in train
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward
sample, res_samples = downsample_block(
File "C:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 612, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "C:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\attention.py", line 484, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\AI\kohya_ss\library\train_util.py", line 1790, in forward_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None) # 最適なのを選んでくれる
File "C:\AI\kohya_ss\venv\lib\site-packages\xformers\ops.py", line 858, in memory_efficient_attention
).op
File "C:\AI\kohya_ss\venv\lib\site-packages\xformers\ops.py", line 726, in op
raise NotImplementedError(f"No operator found for this attention: {self}")
NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cpu'), k=40, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=7680, q_len=7680, kv=40, batch_size=2, num_heads=8)
steps: 0%| | 0/1600 [00:06<?, ?it/s]
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\AI\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/AI/Process lora/AW/image', '--resolution=768,768', '--output_dir=C:/AI/Process lora/AW/model', '--logging_dir=C:/AI/Process lora/AW/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=AW Model', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1600', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=float', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

Enslavedpixels · 2023-03-29T12:07:39Z

Enslavedpixels
Mar 29, 2023

I had the "ValueError: fp16 mixed precision requires a GPU" issue before too, and after manually updating "train_util.py", I am getting the same error as you are now.

2 replies

ilikespace1 Mar 29, 2023
Author

I wonder if we need to update another file along with this one or if it's just something that's wrong in the one we copied?

ilikespace1 Apr 2, 2023
Author

I updated some more files and ended up getting the original error again ""ValueError: fp16 mixed precision requires a GPU"

qiyecao5 · 2023-04-01T16:35:17Z

qiyecao5
Apr 1, 2023

你好，你解决了吗？

2 replies

qiyecao5 Apr 1, 2023

ilikespace1 Apr 2, 2023
Author

Nope

ilikespace1 · 2023-04-06T02:12:29Z

ilikespace1
Apr 6, 2023
Author

Should I just conclude that kohya_ss doesn't work on AMD GPU's without ROCm? (so Windows users)

1 reply

hqnicolas May 10, 2024

@ilikespace1 absolutely not, I just created a docker image that solves the problems for AMD users
https://github.com/hqnicolas/bmaltaisKohya_ssROCm
if You like it don't forget to leave a star on my repo ⭐

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error (Updated) #468

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Training error (Updated) #468

ilikespace1 Mar 28, 2023

Replies: 3 comments · 5 replies

Enslavedpixels Mar 29, 2023

ilikespace1 Mar 29, 2023 Author

ilikespace1 Apr 2, 2023 Author

qiyecao5 Apr 1, 2023

qiyecao5 Apr 1, 2023

ilikespace1 Apr 2, 2023 Author

ilikespace1 Apr 6, 2023 Author

hqnicolas May 10, 2024

ilikespace1
Mar 28, 2023

Replies: 3 comments 5 replies

Enslavedpixels
Mar 29, 2023

ilikespace1 Mar 29, 2023
Author

ilikespace1 Apr 2, 2023
Author

qiyecao5
Apr 1, 2023

ilikespace1 Apr 2, 2023
Author

ilikespace1
Apr 6, 2023
Author