Skip to content

Conversation

@dimidagd
Copy link

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch 6 times, most recently from ebb610b to 0462fb0 Compare September 30, 2025 13:53
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left two comments

@dimidagd
Copy link
Author

dimidagd commented Oct 1, 2025

Discussed reviewers comments, no immediate points of action. Left for future

  • Refactoring to remove redundant code between Dinov2 and DINOv3
  • Convert meta checkpoint with classifier and add a regression test

@molbap would you like to approve this PR?

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just need to check get_input_embeddings, as in most cases it's not needed to add it and we prefer to add as little code as possible. It's not needed in Dinov2, why here?
Otherwise LGTM and ok to merge once this is sorted out!

@dimidagd
Copy link
Author

dimidagd commented Oct 7, 2025

@molbap what is the process for merging PRs after they have been approved? Trying to understand better the contribution workflow

Best
DD

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a pending question on which test was failing for the embeddings, please let me know! apart from that for the review process when reviewers have approved we ping the core maintainers to approve and merge your PR!

This one in particular launched an internal discussion since we're trying to minimize the maintenance surface, see here #41450 but it'll wait a follow-up PR.

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from fce9f88 to 548597e Compare October 16, 2025 16:03
@dimidagd
Copy link
Author

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

Here is the failing test

https://app.circleci.com/jobs/github/huggingface/transformers/1982122

@dimidagd
Copy link
Author

dimidagd commented Oct 16, 2025

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

Here is the failing test

https://app.circleci.com/jobs/github/huggingface/transformers/1982122

on a second thought, I removed the associated test in 1fb6000

Looking forward to your feedback

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 1fb6000 to 416d0b2 Compare October 16, 2025 20:05
@dimidagd
Copy link
Author

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

@molbap upon rereading your comments, I think you implied that adding the get_input_embeddings should not be necessary.

Here is the failing test, which was copied over from dinov2.

https://app.circleci.com/jobs/github/huggingface/transformers/1993707

@molbap
Copy link
Contributor

molbap commented Oct 31, 2025

Hi @dimidagd , circling back to this, this is to support the classification head released by Meta for dinov3, right? In that case, it would also be needed to have a small test to load that head and test it works to avoid future regressions. I'm referring to this https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification

That way there'd be more ground to support the implementation of a new head!

@dimidagd
Copy link
Author

dimidagd commented Oct 31, 2025 via email

@merveenoyan
Copy link
Contributor

hey @dimidagd thanks a lot for working on this!

Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience?

We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too?

@dimidagd
Copy link
Author

dimidagd commented Nov 6, 2025

hey @dimidagd thanks a lot for working on this!

Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience?

We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too?

Hi @merveenoyan! My dev machine is on codespaces, with limited memory, therefore I can't even load their checkpoints. Do you have access to resources I could use?

@merveenoyan
Copy link
Contributor

@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok!

@dimidagd
Copy link
Author

dimidagd commented Nov 7, 2025

@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok!

@merveenoyan hi again, meta released the classification head only for the 7B model. I can't fit that on colab resources either (GPU or CPU). I can't imagine myself implementing such tests from a colab env tbh.

@molbap
Copy link
Contributor

molbap commented Nov 7, 2025

@dimidagd Just in case, we have a run-slow CI which we can run, which runs tests with a @slow marker and fits a 7B. If you write a corresponding ...IntegrationTest, we can run it on our large github runner and test out the result, just ping me and I'll run it

dimidagd and others added 5 commits November 13, 2025 12:46
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from e204914 to f42a62e Compare November 13, 2025 12:46
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 92840b4 to 7ce4837 Compare November 13, 2025 17:12
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch 3 times, most recently from 63639c5 to 2c0786b Compare November 14, 2025 13:49
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 2c0786b to d6980cf Compare November 14, 2025 14:29
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 365715e to ad9b1bc Compare November 14, 2025 20:22
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, dinov3_vit

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 2a404d2 to 174d7c5 Compare November 14, 2025 23:39
@dimidagd
Copy link
Author

@molbap I uploaded the 7b weights with the linear classifier adapter on HF, and wrote a simple test on the cat COCO sample in the repo. Perhaps you could run the slow tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants