Skip to content

Deformable DETR Finetuning breaks for any dataset #42202

@iamsashank09

Description

@iamsashank09

System Info

  • GPU: V100
  • torch2.6.0+cu126
  • transformers 4.57.1

Who can help?

Hi @yonigozlan @molbap @NielsRogge

Thanks for the awesome work on vision models!

I've been trying to finetune the Deformable DETR models (SenseTime/deformable-detr-with-box-refine-two-stage) for the past few days on a custom object detection dataset using the finetuning DETR notebook suggested in the Docs (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) and have swapped out the model name where needed, to the one Deformable DETR model mentioned above , and I have constantly been running into errors, two in particular:

File /libraries/env/lib/python3.11/site-packages/transformers/loss/loss_deformable_detr.py:55, in (.0)
52 cost_matrix = cost_matrix.view(batch_size, num_queries, -1).cpu()
54 sizes = [len(v["boxes"]) for v in targets] ---> 55 indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
56 return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]

ValueError: matrix contains invalid numeric entries

(on several forums this was addressed by turning off AMP, which in your example notebook using Trainer, can be done by passing precision = 32)

and when I can get that to work, I am immediately hit by -

"/libraries/env/lib/python3.11/site-packages/transformers/loss/loss_for_object_detection.py", line 418, in generalized_box_iou [rank1]: raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
[rank1]: ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: ...,
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan]], device='cuda:1')
Epoch 0: 0%| | 1/4358 [00:03<4:23:53, 0.28it/s, v_num=20, training_loss_step=nan.0]

I for the life of me can't figure out what is going on. I tried the same notebook code with my dataset using the original DETR model (facebook/detr-resnet-50) listed and it works perfectly well.

For sanity, I went back and tried to run the balloon dataset as-is in the notebook (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb), but with the Deformable DETR model and processor and I run into the same errors, kinda proving that my data wasn't the issue.

Would love to get your help in understanding why the same notebook doesn't work with the Deformable DETR checkpoints I linked above, since it worked perfectly well on the DETR one's.

Other Env details:

  • GPU: V100
  • torch2.6.0+cu126
  • transformers 4.57.1

The deformable family of models would suit my usecase well and hence have been trying to make it work.
Thank you, love the work the team puts in and appreciate the effort.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The exact steps followed in https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb

but with the SenseTime/deformable-detr-with-box-refine-two-stage (or other SenseTime/deformable-detr-*) models.

Expected behavior

Expecting it to work similar to the DETR finetuning

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions