Skip to content

Comments

Add DINOv3ViTForImageClassification support#41224

Open
dimidagd wants to merge 16 commits intohuggingface:mainfrom
dimidagd:feat/dinov3forimageclassification
Open

Add DINOv3ViTForImageClassification support#41224
dimidagd wants to merge 16 commits intohuggingface:mainfrom
dimidagd:feat/dinov3forimageclassification

Conversation

@dimidagd
Copy link

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch 6 times, most recently from ebb610b to 0462fb0 Compare September 30, 2025 13:53
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left two comments

@dimidagd
Copy link
Author

dimidagd commented Oct 1, 2025

Discussed reviewers comments, no immediate points of action. Left for future

  • Refactoring to remove redundant code between Dinov2 and DINOv3
  • Convert meta checkpoint with classifier and add a regression test

@molbap would you like to approve this PR?

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just need to check get_input_embeddings, as in most cases it's not needed to add it and we prefer to add as little code as possible. It's not needed in Dinov2, why here?
Otherwise LGTM and ok to merge once this is sorted out!

@dimidagd
Copy link
Author

dimidagd commented Oct 7, 2025

@molbap what is the process for merging PRs after they have been approved? Trying to understand better the contribution workflow

Best
DD

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a pending question on which test was failing for the embeddings, please let me know! apart from that for the review process when reviewers have approved we ping the core maintainers to approve and merge your PR!

This one in particular launched an internal discussion since we're trying to minimize the maintenance surface, see here #41450 but it'll wait a follow-up PR.

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from fce9f88 to 548597e Compare October 16, 2025 16:03
@dimidagd
Copy link
Author

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

Here is the failing test

https://app.circleci.com/jobs/github/huggingface/transformers/1982122

@dimidagd
Copy link
Author

dimidagd commented Oct 16, 2025

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

Here is the failing test

https://app.circleci.com/jobs/github/huggingface/transformers/1982122

on a second thought, I removed the associated test in 1fb6000

Looking forward to your feedback

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 1fb6000 to 416d0b2 Compare October 16, 2025 20:05
@dimidagd
Copy link
Author

Added a missed head_mask, other tham that ping me when you know for the failing test and we can merge this! With #41276 it's nice additions to dinov3

@molbap upon rereading your comments, I think you implied that adding the get_input_embeddings should not be necessary.

Here is the failing test, which was copied over from dinov2.

https://app.circleci.com/jobs/github/huggingface/transformers/1993707

@molbap
Copy link
Contributor

molbap commented Oct 31, 2025

Hi @dimidagd , circling back to this, this is to support the classification head released by Meta for dinov3, right? In that case, it would also be needed to have a small test to load that head and test it works to avoid future regressions. I'm referring to this https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-heads---image-classification

That way there'd be more ground to support the implementation of a new head!

@dimidagd
Copy link
Author

dimidagd commented Oct 31, 2025 via email

@merveenoyan
Copy link
Contributor

hey @dimidagd thanks a lot for working on this!

Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience?

We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too?

@dimidagd
Copy link
Author

dimidagd commented Nov 6, 2025

hey @dimidagd thanks a lot for working on this!

Meta has three checkpoints for image classification (with linear layer), object detection (DINO3+DETR) and depth estimation (DINO3+DPT). Would you like to load image classification weights and push to Hub for people's convenience?

We have DETR and DPT officially supported in transformers. Would you be down to convert the depth and detector checkpoints too?

Hi @merveenoyan! My dev machine is on codespaces, with limited memory, therefore I can't even load their checkpoints. Do you have access to resources I could use?

@merveenoyan
Copy link
Contributor

@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok!

@dimidagd
Copy link
Author

dimidagd commented Nov 7, 2025

@dimidagd can you use a Colab? 👀 using small checkpoints to validate implementation is ok!

@merveenoyan hi again, meta released the classification head only for the 7B model. I can't fit that on colab resources either (GPU or CPU). I can't imagine myself implementing such tests from a colab env tbh.

@molbap
Copy link
Contributor

molbap commented Nov 7, 2025

@dimidagd Just in case, we have a run-slow CI which we can run, which runs tests with a @slow marker and fits a 7B. If you write a corresponding ...IntegrationTest, we can run it on our large github runner and test out the result, just ping me and I'll run it

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch 5 times, most recently from 63639c5 to 2c0786b Compare November 14, 2025 13:49
- Slow test checking logits and predicted class based on COCO cat sample
- Migrated tests from Dinov2
…dinov3`.

When the backbone is stored under the wrong attribute, both `AutoModelForImageClassification.from_pretrained` and `DINOv3ViTForImageClassification` created from a headless checkpoint fail to load weights correctly because the state dict cannot map the backbone parameters.

This change aligns the class with the expected naming used by the loader utilities and restores correct weight loading behavior.
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from ea1c1fb to fb8d4c4 Compare November 18, 2025 12:59
@molbap
Copy link
Contributor

molbap commented Nov 18, 2025

run-slow: dinov3_vit

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/dinov3_vit"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • dinov3_vit:
    tests/models/dinov3_vit/test_modeling_dinov3_vit.py::DINOv3ViTModelIntegrationTest::test_inference_lc_head_imagenet

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 73c9fca to 83ca9ca Compare November 21, 2025 12:15
@dimidagd
Copy link
Author

dimidagd commented Nov 21, 2025

@molbap I moved the tests on cpu using the large runner, had a GPU OOM error before, hopefully it has enough RAM to run the test. And by the way, tests based on facebook/dinov3* checkpoints are failing on that runner, because the repo is gated.

@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 83ca9ca to 6914a43 Compare November 24, 2025 12:31
@dimidagd dimidagd force-pushed the feat/dinov3forimageclassification branch from 6914a43 to 2e9a1c2 Compare November 24, 2025 13:49
@dimidagd
Copy link
Author

@molbap let me know if any further improvements are needed on this :)

@dimidagd
Copy link
Author

@molbap pinging you to hear more abt this PR :)

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, dinov3_vit

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41224&sha=466177

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Now that this is merged with main, some things have changed a bit in terms of code quality. The rest is ok

Comment on lines +618 to +620
pixel_values: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
**kwargs: Unpack[TransformersKwargs],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with v5 around the corner:
now you should use make style && make fix-repo before pushing to make CI happy(happier). in this particular occurence we don't use Optional anymore as you can see on the failed tests (just default to None + use | is enough)

self.assertIsNotNone(model)

@slow
def test_model_for_image_classification_from_pretrained(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will also require the large accelerator no? Let's remove it, it's tested in the Integration test below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add to the doc in usage snippet an example with the usage now possible with the image classification head, with the example of the one converted from Meta

@ArthurZucker ArthurZucker removed their request for review February 11, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants