Deformable DETR Finetuning breaks for any dataset

### System Info

- GPU: V100
- torch2.6.0+cu126
- transformers 4.57.1

### Who can help?

Hi @yonigozlan  @molbap  @NielsRogge 

Thanks for the awesome work on vision models!

I've been trying to finetune the Deformable DETR models (SenseTime/deformable-detr-with-box-refine-two-stage) for the past few days on a custom object detection dataset using the finetuning DETR notebook suggested in the Docs (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb)  and have swapped out the model name where needed, to the one Deformable DETR model mentioned above , and I have constantly been running into errors, two in particular:

> File /libraries/env/lib/python3.11/site-packages/transformers/loss/loss_deformable_detr.py:55, in <listcomp>(.0) 
52 cost_matrix = cost_matrix.view(batch_size, num_queries, -1).cpu() 
54 sizes = [len(v["boxes"]) for v in targets] ---> 55 indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))] 
56 return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices] 
\
ValueError: matrix contains invalid numeric entries

(on several forums this was addressed by turning off AMP, which in your example notebook using Trainer, can be done by passing precision = 32)

and when I can get that to work, I am immediately hit by - 

> "/libraries/env/lib/python3.11/site-packages/transformers/loss/loss_for_object_detection.py", line 418, in generalized_box_iou [rank1]: raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}") 
[rank1]: ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan], 
[rank1]: [nan, nan, nan, nan], 
[rank1]: [nan, nan, nan, nan], 
[rank1]: ...,
[rank1]: [nan, nan, nan, nan], 
[rank1]: [nan, nan, nan, nan], 
[rank1]: [nan, nan, nan, nan]], device='cuda:1') 
Epoch 0: 0%| | 1/4358 [00:03<4:23:53, 0.28it/s, v_num=20, training_loss_step=nan.0]


I for the life of me can't figure out what is going on. I tried the same notebook code with my dataset using the original DETR model (facebook/detr-resnet-50) listed and it works perfectly well. 

For sanity, I went back and tried to run the balloon dataset as-is in the notebook (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb), but with the Deformable DETR model and processor and I run into the same errors, kinda proving that my data wasn't the issue. 

Would love to get your help in understanding why the same notebook doesn't work with the Deformable DETR checkpoints I linked above, since it worked perfectly well on the DETR one's. 

Other Env details:
- GPU: V100
- torch2.6.0+cu126
- transformers 4.57.1

The deformable family of models would suit my usecase well and hence have been trying to make it work. 
Thank you, love the work the team puts in and appreciate the effort. 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

The exact steps followed in https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb

but with the SenseTime/deformable-detr-with-box-refine-two-stage (or other SenseTime/deformable-detr-*) models. 

### Expected behavior

Expecting it to work similar to the DETR finetuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deformable DETR Finetuning breaks for any dataset #42202

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deformable DETR Finetuning breaks for any dataset #42202

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions