-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stage2 RuntimeError: The size of tensor a (22) must match the size of tensor b (23) at non-singleton dimension 3 #162
Comments
After a few hours of hard work, if anyone has the same problem as me, just replace 'MusePose/src/models/unet_3d.py' with the following:
|
~/# accelerate launch train_stage_2.py --config configs/train/stage2.yaml
The following values were not passed to
accelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.10/12/2024 09:16:18 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
{'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values.
{'addition_time_embed_dim', 'time_embedding_type', 'num_class_embeds', 'encoder_hid_dim', 'encoder_hid_dim_type', 'addition_embed_type_num_heads', 'addition_embed_type', 'dual_cross_attention', 'dropout', 'resnet_out_scale_factor', 'attention_type', 'reverse_transformer_layers_per_block', 'projection_class_embeddings_input_dim', 'mid_block_type', 'conv_out_kernel', 'resnet_skip_time_act', 'use_linear_projection', 'class_embeddings_concat', 'time_embedding_dim', 'timestep_post_act', 'resnet_time_scale_shift', 'only_cross_attention', 'transformer_layers_per_block', 'class_embed_type', 'conv_in_kernel', 'time_cond_proj_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'num_attention_heads', 'upcast_attention', 'cross_attention_norm'} was not found in config. Values will be initialized to default values.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
10/12/2024 09:16:25 - INFO - src.models.unet_3d - loaded temporal unet's pretrained weights from pretrained_weights/stable-diffusion-v1-5/unet ...
{'dual_cross_attention', 'use_linear_projection', 'num_class_embeds', 'upcast_attention', 'mode', 'task_type', 'resnet_time_scale_shift', 'only_cross_attention', 'class_embed_type'} was not found in config. Values will be initialized to default values.
10/12/2024 09:16:38 - INFO - src.models.unet_3d - Load motion module params from pretrained_weights/mm_sd_v15_v2.ckpt
10/12/2024 09:16:39 - INFO - src.models.unet_3d - Loaded 453.20928M-parameter motion module
10/12/2024 09:16:44 - INFO - main - Total trainable params 546
10/12/2024 09:16:45 - INFO - main - ***** Running training *****
10/12/2024 09:16:45 - INFO - main - Num examples = 7755
10/12/2024 09:16:45 - INFO - main - Num Epochs = 2
10/12/2024 09:16:45 - INFO - main - Instantaneous batch size per device = 1
10/12/2024 09:16:45 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
10/12/2024 09:16:45 - INFO - main - Gradient Accumulation steps = 1
10/12/2024 09:16:45 - INFO - main - Total optimization steps = 10000
Steps: 0%| | 0/10000 [00:00<?, ?it/s]10/12/2024 09:16:50 - INFO - src.models.unet_3d - Forward upsample size to force interpolation output size.
Traceback (most recent call last):
File "/root/MusePose/train_stage_2.py", line 773, in
main(config)
File "/root/MusePose/train_stage_2.py", line 602, in main
model_pred = net(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 813, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/root/MusePose/train_stage_2.py", line 96, in forward
model_pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root//src/models/unet_3d.py", line 505, in forward
sample = sample + pose_cond_fea
RuntimeError: The size of tensor a (22) must match the size of tensor b (23) at non-singleton dimension 3
Steps: 0%| | 0/10000 [00:06<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1075, in launch_command
simple_launcher(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 681, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', 'train_stage_2.py', '--config', 'configs/train/stage2.yaml']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: