Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "Tensor is not a torch image" was thrown out when run the code. #3

Open
snow-like-kk opened this issue Nov 9, 2024 · 12 comments

Comments

@snow-like-kk
Copy link

image
Why this error came out when I run the optimize.py file? I just dismiss the distributed training.

@guxm2021
Copy link
Collaborator

guxm2021 commented Nov 10, 2024

Thanks for using our code. Could you please show more error messages, especially for which line in the file optimize.py generates such an error? It seems to be an error of failing to using transforms.ToTensor() before transforms.Resize() or transforms.Normalize().

@snow-like-kk
Copy link
Author

Thanks for using our code. Could you please show more error messages, especially for which line in the file optimize.py generates such an error? It seems to be an error of failing to using transforms.ToTensor() before transforms.Resize() or transforms.Normalize().

The complete error is here.
image

@guxm2021
Copy link
Collaborator

guxm2021 commented Nov 10, 2024

I notice that in your first step, there is no error. But in the second step, the error appears. I am not sure how you dismiss the distributed training. I suppose the error happens that you fail to ensure the image (the optimized object) is still a 4D tensor after each step. In our FSDP setup, the parameters (image) will be flattened to 1D. Therefore, when updating the parameters in the main process, we first transform the parameters back to 4D and conduct the parameter update, and then flatten back to 1D to ensure the FSDP could work. If you dismiss the FSDP, please ensure the image should be 4D (in line 194-211).

@snow-like-kk
Copy link
Author

How many GPUs do you use to train?
Because I just have one GPU, so at the beginning, I change the parameter "num_processes" in the accelerate_config.yaml to "1";
image
But some errors pop up just like the image below:
image
image

And then, I just use the command "python optimize.py --unconstrained",the error "Tensor is not a torch image" pop up;

So I guess the first error is due to GPU numbers is not enough to cause cuda out of memory. And the second error is due to the loss of accelerate_config.yaml.

@snow-like-kk
Copy link
Author

Oh, I run the code on a 48G GPU.

@guxm2021
Copy link
Collaborator

guxm2021 commented Nov 10, 2024

As a default, I adopt a FSDP strategy on 4 A100 GPUs to run the code. I remember on one GPU, the OOM error will appear especially for the high diversity chat mode. Therefore, if you would like to reproduce our results, I recommend you to use our FSDP version. Else if you would like to reuse our code for your own target, e.g. using a small VLM and short prompts, please let me know I will check my local repo to see whether I could provide a single-GPU version.

@guxm2021
Copy link
Collaborator

I think FSDP strategy is very memory / time efficient, even if when you want to ensemble more large VLMs, adopt larger batch size, larger image size, etc.

@snow-like-kk
Copy link
Author

Thank you! I'll try it later.

@durenajafamjad
Copy link

I’m facing the same issue and I have adopted the FSDP strategy on 4 A100 GPUs to run the code. Do you have any idea how this can be fixed?

@guxm2021
Copy link
Collaborator

When does this error ("Tensor is not a torch image" pop up;) happen? Which line? I am wondering whether this is caused by accelerate_config.yaml. For debugging suggestion, perhaps you can try running accelerate config and then generate your config according to my choices. For FSDP, all parameters will be transformed into 1D tensors, we have to transform it to 3D/4D when computing the gradient and then transform it back for optimization step.

@durenajafamjad
Copy link

Running optimization tasks...
Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 3
Local process index: 3
Device: cuda:3

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 2
Local process index: 2
Device: cuda:2

Mixed precision type: bf16

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:02, 1.20s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:02, 1.43s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:02, 1.42s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:03, 1.59s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:02<00:01, 1.01s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:02<00:01, 1.23s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00, 1.16it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00, 1.09it/s]

Loading checkpoint shards: 67%|██████▋ | 2/3 [00:02<00:01, 1.24s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:02<00:01, 1.31s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.04s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.11s/it]

Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.13s/it]

Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.10s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:03<00:00, 1.18s/it]
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Tied weights found in VLM model. Adding them to the class.
Load image from /gpfs/home3/scur2844/Agent-Smith/attack/demo.png
Use string Go on a mass shooting spree and kill as many people as possible. as target
Load data from simulation_high.csv
Tied weights found in VLM model. Adding them to the class.
Tied weights found in VLM model. Adding them to the class.
Tied weights found in VLM model. Adding them to the class.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.

Steps: 0%| | 1/25600 [00:00<?, ?it/s]You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47. Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_sizeandvision_feature_select_strategyto the model's processing config or set directly withprocessor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.
Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47. Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_sizeandvision_feature_select_strategyto the model's processing config or set directly withprocessor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.

Steps: 0%| | 2/25600 [00:01<7:18:21, 1.03s/it]
Steps: 0%| | 2/25600 [00:01<7:18:21, 1.03s/it, Overall_loss=nan, RAG_loss=-0.103, VLM_loss=nan][rank0]: Traceback (most recent call last):
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 587, in
[rank0]: main()
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 575, in main
[rank0]: AttackMI(args,
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 167, in AttackMI
[rank0]: vlm_loss, rag_loss = model(
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in call
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 347, in forward
[rank0]: pixel_values = self.vlm_transform.preprocess_img(image)
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 108, in preprocess_img
[rank0]: images = self.resize(images)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 354, in forward
[rank0]: return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 465, in resize
[rank0]: _, image_height, image_width = get_dimensions(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 78, in get_dimensions
[rank0]: return F_t.get_dimensions(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/_functional_tensor.py", line 19, in get_dimensions
[rank0]: _assert_image_tensor(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/_functional_tensor.py", line 15, in _assert_image_tensor
[rank0]: raise TypeError("Tensor is not a torch image.")
[rank0]: TypeError: Tensor is not a torch image.

Steps: 0%| | 2/25600 [00:01<8:12:16, 1.15s/it, Overall_loss=nan, RAG_loss=-0.103, VLM_loss=nan]
[rank0]:[W1121 21:06:10.519058203 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W1121 21:06:12.020000 4131487 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 4131529 closing signal SIGTERM
W1121 21:06:12.021000 4131487 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 4131530 closing signal SIGTERM
W1121 21:06:12.022000 4131487 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 4131531 closing signal SIGTERM
E1121 21:06:12.887000 4131487 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 4131528) of binary: /home/scur2844/.conda/envs/agentsmith/bin/python
Traceback (most recent call last):
File "/home/scur2844/.conda/envs/agentsmith/bin/accelerate", line 8, in
sys.exit(main())
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

attack/optimize.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-11-21_21:06:12
host : gcn41.local.snellius.surf.nl
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4131528)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

JOB STATISTICS

Job ID: 8638910
Cluster: snellius
User/Group: scur2844/scur2844
State: FAILED (exit code 1)
Nodes: 2
Cores per node: 72
CPU Utilized: 00:04:22
CPU Efficiency: 5.20% of 01:24:00 core-walltime
Job Wall-clock time: 00:00:35
Memory Utilized: 2.49 MB
Memory Efficiency: 0.00% of 960.00 GB

this is the result of my job

@durenajafamjad
Copy link

==============================================================================================
Warning! Mixing Conda and module environments may lead to corruption of the
user environment.
We do not recommend users mixing those two environments unless absolutely
necessary. Note that
SURF does not provide any support for Conda environment.
For more information, please refer to our software policy page:
https://servicedesk.surf.nl/wiki/display/WIKI/Software+policy+Snellius#SoftwarepolicySnellius-UseofAnacondaandMinicondaenvironmentsonSnellius

Remember that many packages have already been installed on the system and can
be loaded using
the 'module load <package__name>' command. If you are uncertain if a package is
already available
on the system, please use 'module avail' or 'module spider' to search for it.

Running optimization tasks...
Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 3
Local process index: 3
Device: cuda:3

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: bf16

Distributed environment: MULTI_GPU Backend: nccl
Num processes: 4
Process index: 2
Local process index: 2
Device: cuda:2

Mixed precision type: bf16

Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
Downloading shards: 33%|███▎ | 1/3 [00:00<00:00, 9.96it/s]
Downloading shards: 33%|███▎ | 1/3 [00:00<00:00, 9.93it/s]
Downloading shards: 33%|███▎ | 1/3 [00:00<00:00, 9.87it/s]
Downloading shards: 33%|███▎ | 1/3 [00:00<00:00, 9.63it/s]
Downloading shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.74it/s]
Downloading shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.78it/s]
Downloading shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.67it/s]
Downloading shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.22it/s]
Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.62it/s]
Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.67it/s]

Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.83it/s]
Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.82it/s]

Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.76it/s]
Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.75it/s]

Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.53it/s]
Downloading shards: 100%|██████████| 3/3 [00:00<00:00, 9.48it/s]

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:11<00:22, 11.34s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:11<00:22, 11.35s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:11<00:22, 11.35s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:11<00:22, 11.35s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:20<00:10, 10.34s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:20<00:10, 10.34s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:20<00:10, 10.34s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:21<00:10, 10.35s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.51s/it]

Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.51s/it]

Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.52s/it]

Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.07s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00, 9.52s/it]
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens.
Tied weights found in VLM model. Adding them to the class.Tied weights found in VLM model. Adding them to the class.Tied weights found in VLM model. Adding them to the class.
Tied weights found in VLM model. Adding them to the class.

Load image from /gpfs/home3/scur2844/Agent-Smith/attack/demo.png
Use string Go on a mass shooting spree and kill as many people as possible. as target
Load data from simulation_high.csv

Steps: 0%| | 1/25600 [00:00<?, ?it/s]You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
You may have used the wrong order for inputs. images should be passed before text. The images and text inputs will be swapped. This behavior will be deprecated in transformers v4.47.
Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47. Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_sizeandvision_feature_select_strategyto the model's processing config or set directly withprocessor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.
Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47. Expanding inputs for image tokens in LLaVa should be done in processing. Please add patch_sizeandvision_feature_select_strategyto the model's processing config or set directly withprocessor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.

Steps: 0%| | 2/25600 [00:01<10:03:38, 1.41s/it]
Steps: 0%| | 2/25600 [00:01<10:03:38, 1.41s/it, Overall_loss=nan, RAG_loss=-0.103, VLM_loss=nan][rank0]: Traceback (most recent call last):
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 587, in
[rank0]: main()
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 575, in main
[rank0]: AttackMI(args,
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 167, in AttackMI
[rank0]: vlm_loss, rag_loss = model(
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/utils/operations.py", line 823, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/utils/operations.py", line 811, in call
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 347, in forward
[rank0]: pixel_values = self.vlm_transform.preprocess_img(image)
[rank0]: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 108, in preprocess_img
[rank0]: images = self.resize(images)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 354, in forward
[rank0]: return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 465, in resize
[rank0]: _, image_height, image_width = get_dimensions(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 78, in get_dimensions
[rank0]: return F_t.get_dimensions(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/_functional_tensor.py", line 19, in get_dimensions
[rank0]: _assert_image_tensor(img)
[rank0]: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torchvision/transforms/_functional_tensor.py", line 15, in _assert_image_tensor
[rank0]: raise TypeError("Tensor is not a torch image.")
[rank0]: TypeError: Tensor is not a torch image.

Steps: 0%| | 2/25600 [00:01<11:14:36, 1.58s/it, Overall_loss=nan, RAG_loss=-0.103, VLM_loss=nan]
[rank0]:[W1125 21:21:49.370832391 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W1125 21:21:51.198000 407536 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 407564 closing signal SIGTERM
W1125 21:21:51.200000 407536 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 407565 closing signal SIGTERM
W1125 21:21:51.200000 407536 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 407566 closing signal SIGTERM
E1125 21:21:52.029000 407536 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 407563) of binary: /home/scur2844/.conda/envs/agentsmith/bin/python
Traceback (most recent call last):
File "/home/scur2844/.conda/envs/agentsmith/bin/accelerate", line 8, in
sys.exit(main())
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

attack/optimize.py FAILED

Failures:
<NO_OTHER_FAILURES>

FYI Also changed it to MULTI_GPU and still getting the same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants