Add DINOv3ViTForImageClassification support#41224
Add DINOv3ViTForImageClassification support#41224dimidagd wants to merge 16 commits intohuggingface:mainfrom
Conversation
ebb610b to
0462fb0
Compare
molbap
left a comment
There was a problem hiding this comment.
Thanks! Left two comments
|
Discussed reviewers comments, no immediate points of action. Left for future
@molbap would you like to approve this PR? |
molbap
left a comment
There was a problem hiding this comment.
Sure, just need to check get_input_embeddings, as in most cases it's not needed to add it and we prefer to add as little code as possible. It's not needed in Dinov2, why here?
Otherwise LGTM and ok to merge once this is sorted out!
|
@molbap what is the process for merging PRs after they have been approved? Trying to understand better the contribution workflow Best |
molbap
left a comment
There was a problem hiding this comment.
I have a pending question on which test was failing for the embeddings, please let me know! apart from that for the review process when reviewers have approved we ping the core maintainers to approve and merge your PR!
This one in particular launched an internal discussion since we're trying to minimize the maintenance surface, see here #41450 but it'll wait a follow-up PR.
fce9f88 to
548597e
Compare
Here is the failing test https://app.circleci.com/jobs/github/huggingface/transformers/1982122 |
on a second thought, I removed the associated test in 1fb6000 Looking forward to your feedback |
1fb6000 to
416d0b2
Compare
@molbap upon rereading your comments, I think you implied that adding the get_input_embeddings should not be necessary. Here is the failing test, which was copied over from dinov2. https://app.circleci.com/jobs/github/huggingface/transformers/1993707 |
|
Hi @dimidagd , circling back to this, this is to support the classification head released by Meta for dinov3, right? In that case, it would also be needed to have a small test to load that head and test it works to avoid future regressions. I'm referring to this https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification That way there'd be more ground to support the implementation of a new head! |
|
@molbap I will see if I can find the time for this, what are your thoughts on the
`get_input_embeddings` test?
…On Fri, 31 Oct 2025 at 16:03, Pablo Montalvo ***@***.***> wrote:
*molbap* left a comment (huggingface/transformers#41224)
<#41224 (comment)>
Hi @dimidagd <https://github.com/dimidagd> , circling back to this, this
is to support the classification head released by Meta for dinov3, right?
In that case, it would also be needed to have a small test to load that
head and test it works to avoid future regressions. I'm referring to this
https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification
That way there'd be more ground to support the implementation of a new
head!
—
Reply to this email directly, view it on GitHub
<#41224 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALECAUJ5KWWT4LDURHPDCB332N24LAVCNFSM6AAAAACH4V2IVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINZTGQ4DGOJWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
[image: photo]
*Dagdilelis Dimitris | PhD*
m:+45 52 60 18 25 <+45+52+60+18+25> | ***@***.*** | a:Gullandsgade
18, 2800 Lyngby
<http://us.linkedin.com/in/dimidagd>
|
|
hey @dimidagd thanks a lot for working on this! Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience? We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too? |
Hi @merveenoyan! My dev machine is on codespaces, with limited memory, therefore I can't even load their checkpoints. Do you have access to resources I could use? |
|
@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok! |
@merveenoyan hi again, meta released the classification head only for the 7B model. I can't fit that on colab resources either (GPU or CPU). I can't imagine myself implementing such tests from a colab env tbh. |
|
@dimidagd Just in case, we have a |
63639c5 to
2c0786b
Compare
- Slow test checking logits and predicted class based on COCO cat sample - Migrated tests from Dinov2
…dinov3`. When the backbone is stored under the wrong attribute, both `AutoModelForImageClassification.from_pretrained` and `DINOv3ViTForImageClassification` created from a headless checkpoint fail to load weights correctly because the state dict cannot map the backbone parameters. This change aligns the class with the expected naming used by the loader utilities and restores correct weight loading behavior.
ea1c1fb to
fb8d4c4
Compare
|
run-slow: dinov3_vit |
|
This comment contains models: ["models/dinov3_vit"] |
CI ResultsModel CI Report❌ Failed tests
|
73c9fca to
83ca9ca
Compare
|
@molbap I moved the tests on cpu using the large runner, had a GPU OOM error before, hopefully it has enough RAM to run the test. And by the way, tests based on |
83ca9ca to
6914a43
Compare
6914a43 to
2e9a1c2
Compare
|
@molbap let me know if any further improvements are needed on this :) |
|
@molbap pinging you to hear more abt this PR :) |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, dinov3_vit |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41224&sha=466177 |
molbap
left a comment
There was a problem hiding this comment.
Hello! Now that this is merged with main, some things have changed a bit in terms of code quality. The rest is ok
| pixel_values: Optional[torch.Tensor] = None, | ||
| labels: Optional[torch.Tensor] = None, | ||
| **kwargs: Unpack[TransformersKwargs], |
There was a problem hiding this comment.
with v5 around the corner:
now you should use make style && make fix-repo before pushing to make CI happy(happier). in this particular occurence we don't use Optional anymore as you can see on the failed tests (just default to None + use | is enough)
| self.assertIsNotNone(model) | ||
|
|
||
| @slow | ||
| def test_model_for_image_classification_from_pretrained(self): |
There was a problem hiding this comment.
will also require the large accelerator no? Let's remove it, it's tested in the Integration test below
There was a problem hiding this comment.
Let's add to the doc in usage snippet an example with the usage now possible with the image classification head, with the example of the one converted from Meta
@yonigozlan @molbap