-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Description
System Info
- GPU: V100
- torch2.6.0+cu126
- transformers 4.57.1
Who can help?
Hi @yonigozlan @molbap @NielsRogge
Thanks for the awesome work on vision models!
I've been trying to finetune the Deformable DETR models (SenseTime/deformable-detr-with-box-refine-two-stage) for the past few days on a custom object detection dataset using the finetuning DETR notebook suggested in the Docs (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) and have swapped out the model name where needed, to the one Deformable DETR model mentioned above , and I have constantly been running into errors, two in particular:
File /libraries/env/lib/python3.11/site-packages/transformers/loss/loss_deformable_detr.py:55, in (.0)
52 cost_matrix = cost_matrix.view(batch_size, num_queries, -1).cpu()
54 sizes = [len(v["boxes"]) for v in targets] ---> 55 indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
56 return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
ValueError: matrix contains invalid numeric entries
(on several forums this was addressed by turning off AMP, which in your example notebook using Trainer, can be done by passing precision = 32)
and when I can get that to work, I am immediately hit by -
"/libraries/env/lib/python3.11/site-packages/transformers/loss/loss_for_object_detection.py", line 418, in generalized_box_iou [rank1]: raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
[rank1]: ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: ...,
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan]], device='cuda:1')
Epoch 0: 0%| | 1/4358 [00:03<4:23:53, 0.28it/s, v_num=20, training_loss_step=nan.0]
I for the life of me can't figure out what is going on. I tried the same notebook code with my dataset using the original DETR model (facebook/detr-resnet-50) listed and it works perfectly well.
For sanity, I went back and tried to run the balloon dataset as-is in the notebook (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb), but with the Deformable DETR model and processor and I run into the same errors, kinda proving that my data wasn't the issue.
Would love to get your help in understanding why the same notebook doesn't work with the Deformable DETR checkpoints I linked above, since it worked perfectly well on the DETR one's.
Other Env details:
- GPU: V100
- torch2.6.0+cu126
- transformers 4.57.1
The deformable family of models would suit my usecase well and hence have been trying to make it work.
Thank you, love the work the team puts in and appreciate the effort.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The exact steps followed in https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb
but with the SenseTime/deformable-detr-with-box-refine-two-stage (or other SenseTime/deformable-detr-*) models.
Expected behavior
Expecting it to work similar to the DETR finetuning