🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 #42156

zucchini-nlp · 2025-11-12T10:30:43Z

What does this PR do?

As per title, blocked by #41589 for VLMs! We should be able to use get_decoder() to get the LM part of any model after this and have much less duplicate code. Same foes for the get_encoder() to get the encoder if the model has a separate encoding module. In comparison to decoder, we can have specific encoder per modality so the helper will accept modality as arg

Universal helper first reduces duplicate code, nudges us to use standardized names for major modules and can be used by 3rd party libraries. Right now we have 5 ways to name a vision encoder!

🚨 Breaking changes, ig we can break helpers for v5:

VLMs now will not have a property to get self.language_model directly from task-model and users will need to call self.get_decoder()
Deleted get_text_encoder and get_audio_encoder in some audio models because functionality is covered now by the get_encoder()

HuggingFaceDocBuilderDev · 2025-11-12T10:40:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-13T14:04:00Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, autoformer, aya_vision, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip_2, cohere2_vision, conditional_detr, d_fine, dab_detr, deformable_detr, detr, dia, emu3

zucchini-nlp · 2025-11-13T14:06:38Z

tests/utils/test_modeling_utils.py

        model = GPT2LMHeadModel(cfg)
        dec = model.get_decoder()

-        assert dec is model, f"GPT2 get_decoder() should return self (fallback), got {type(dec)}"


prev helper didn't cover all edge cases! This should be the base model, if we compare with other LLMs (e.g. llama)

molbap

Very nice unbloating 🔪
OK for me, just would be cool to add to the make style/ruff rules/quality check to reduce cognitive load

molbap · 2025-11-13T14:12:27Z

src/transformers/modeling_utils.py

+        Symmetric setter. Mirrors the lookup logic used in `get_encoder`.
+        """
+
+        # NOTE: new models need to use existing names for layers if possible, so this list doesn't grow infinitely


To note, this should be enforced in make fixup in code consistency part to save ourselves the hassle

hmm, isn't it going to be a huge limitation for contributors if we force it and auto-renam with fix-copies? Imo we need to communicate it when reviewing and explain why it's important. It's only a few ppl reviewing VLMs currently, so it might be easier

src/transformers/modeling_utils.py

molbap · 2025-11-13T14:18:42Z

src/transformers/models/musicgen/modeling_musicgen.py

    ):
        # 1. get audio encoder
-        encoder = self.get_audio_encoder()
+        encoder = self.get_encoder(modality="audio")


ah I see now, ok, it's a good thing it's decoupled from the kwargs actually. So, the logic is

if I don't set the modality: defaults to text encoder

If I set the modality, checks possible module names

If I override get_encoder it's whatever I want

Not 100% sure about the modality arg but I see the motivation, understood

Yep, the idea is to allow users to ask for modality encoder through single interface, instead of having each as self.get_vision_encoder/self.get_audio_encoder etc.

Just added tests to make sure it works

src/transformers/modeling_utils.py

zucchini-nlp · 2025-11-14T10:07:42Z

Merge conflicts after a big refactor 😢

update some models

304d3be

zucchini-nlp added 3 commits November 13, 2025 13:51

update the rest

37415a9

add helper for encoder

0fd4474

delete encoder code from models

2efe34b

zucchini-nlp changed the title ~~[WIP] Generalize get_decoder() for multimodal and delete redundant code 🔪~~ 🚨 Generalize get_decoder() for multimodal and delete redundant code 🔪 Nov 13, 2025

fix copies

f3bfd28

zucchini-nlp requested review from ArthurZucker, Cyrilvallez and molbap November 13, 2025 13:39

fix some tests but VLM will fail

bf4bebd

zucchini-nlp commented Nov 13, 2025

View reviewed changes

molbap approved these changes Nov 13, 2025

View reviewed changes

zucchini-nlp added 2 commits November 14, 2025 14:00

add encider tests simialr to decoder

73bdc27

no print

07af770

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 #42156

🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 #42156

zucchini-nlp commented Nov 12, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

zucchini-nlp Nov 13, 2025 •

edited

Loading

Uh oh!

molbap left a comment

Uh oh!

molbap Nov 13, 2025

Uh oh!

zucchini-nlp Nov 14, 2025

Uh oh!

Uh oh!

molbap Nov 13, 2025

Uh oh!

zucchini-nlp Nov 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

zucchini-nlp commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🚨 Generalize get_decoder() for multimodal and delete redundant code 🔪 #42156

Are you sure you want to change the base?

🚨 Generalize get_decoder() for multimodal and delete redundant code 🔪 #42156

Conversation

zucchini-nlp commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

🚨 Breaking changes, ig we can break helpers for v5:

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

zucchini-nlp Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 #42156

🚨 Generalize `get_decoder()` for multimodal and delete redundant code 🔪 #42156

zucchini-nlp commented Nov 12, 2025 •

edited

Loading

zucchini-nlp Nov 13, 2025 •

edited

Loading

zucchini-nlp Nov 14, 2025 •

edited

Loading