-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Add DINOv3ViTForImageClassification support #41224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add DINOv3ViTForImageClassification support #41224
Conversation
ebb610b to
0462fb0
Compare
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left two comments
|
Discussed reviewers comments, no immediate points of action. Left for future
@molbap would you like to approve this PR? |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just need to check get_input_embeddings, as in most cases it's not needed to add it and we prefer to add as little code as possible. It's not needed in Dinov2, why here?
Otherwise LGTM and ok to merge once this is sorted out!
|
@molbap what is the process for merging PRs after they have been approved? Trying to understand better the contribution workflow Best |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a pending question on which test was failing for the embeddings, please let me know! apart from that for the review process when reviewers have approved we ping the core maintainers to approve and merge your PR!
This one in particular launched an internal discussion since we're trying to minimize the maintenance surface, see here #41450 but it'll wait a follow-up PR.
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3
fce9f88 to
548597e
Compare
Here is the failing test https://app.circleci.com/jobs/github/huggingface/transformers/1982122 |
on a second thought, I removed the associated test in 1fb6000 Looking forward to your feedback |
1fb6000 to
416d0b2
Compare
@molbap upon rereading your comments, I think you implied that adding the get_input_embeddings should not be necessary. Here is the failing test, which was copied over from dinov2. https://app.circleci.com/jobs/github/huggingface/transformers/1993707 |
|
Hi @dimidagd , circling back to this, this is to support the classification head released by Meta for dinov3, right? In that case, it would also be needed to have a small test to load that head and test it works to avoid future regressions. I'm referring to this https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification That way there'd be more ground to support the implementation of a new head! |
|
@molbap I will see if I can find the time for this, what are your thoughts on the
`get_input_embeddings` test?
…On Fri, 31 Oct 2025 at 16:03, Pablo Montalvo ***@***.***> wrote:
*molbap* left a comment (huggingface/transformers#41224)
<#41224 (comment)>
Hi @dimidagd <https://github.com/dimidagd> , circling back to this, this
is to support the classification head released by Meta for dinov3, right?
In that case, it would also be needed to have a small test to load that
head and test it works to avoid future regressions. I'm referring to this
https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification
That way there'd be more ground to support the implementation of a new
head!
—
Reply to this email directly, view it on GitHub
<#41224 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALECAUJ5KWWT4LDURHPDCB332N24LAVCNFSM6AAAAACH4V2IVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINZTGQ4DGOJWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
[image: photo]
*Dagdilelis Dimitris | PhD*
m:+45 52 60 18 25 <+45+52+60+18+25> | ***@***.*** | a:Gullandsgade
18, 2800 Lyngby
<http://us.linkedin.com/in/dimidagd>
|
|
hey @dimidagd thanks a lot for working on this! Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience? We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too? |
Hi @merveenoyan! My dev machine is on codespaces, with limited memory, therefore I can't even load their checkpoints. Do you have access to resources I could use? |
|
@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok! |
@merveenoyan hi again, meta released the classification head only for the 7B model. I can't fit that on colab resources either (GPU or CPU). I can't imagine myself implementing such tests from a colab env tbh. |
|
@dimidagd Just in case, we have a |
- Implements DINOv3ViTForImageClassification class - Implements unit tests - Updates docs
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
This reverts commit 416d0b2.
e204914 to
f42a62e
Compare
92840b4 to
7ce4837
Compare
63639c5 to
2c0786b
Compare
2c0786b to
d6980cf
Compare
365715e to
ad9b1bc
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, dinov3_vit |
2a404d2 to
174d7c5
Compare
|
@molbap I uploaded the 7b weights with the linear classifier adapter on HF, and wrote a simple test on the cat COCO sample in the repo. Perhaps you could run the slow tests? |
@yonigozlan @molbap