fix: auto-untie word embeddings on merge_and_unload when both are adapted by www-spam · Pull Request #2972 · huggingface/peft

www-spam · 2025-12-31T09:25:21Z

Summary

This PR fixes the issue where merge_and_unload() produces broken models when adapters are applied to both embed_tokens and lm_head on models with tie_word_embeddings=True.

Resolves #2777

Problem

When a base model has tie_word_embeddings=True (e.g., Gemma, Llama):

embed_tokens and lm_head share the same weight tensor
Adapters can be applied to both layers (via modules_to_save or target_modules)
After training, each layer has different adapter deltas
merge_and_unload() merges both layers with their respective deltas
Bug: The config still has tie_word_embeddings=True
When the merged model is loaded with AutoModel.from_pretrained(), the lm_head weights are overwritten with embed_tokens weights due to weight tying
Result: The merged lm_head weights are lost, causing degraded or garbage output

Solution

This PR modifies _unload_and_optionally_merge() in BaseTuner to:

Detect if both embedding-like and lm_head-like modules have adapters
Untie the weights by cloning lm_head.weight before merge
Update config.tie_word_embeddings = False in all relevant config locations

This ensures that:

Merged weights are preserved for both layers
The saved model can be loaded correctly
Backward compatibility is maintained (no change when embeddings aren't both targeted)

Changes

src/peft/tuners/tuners_utils.py:
- Added _untie_embedding_weights() helper method
- Added _update_tie_word_embeddings_config() helper method
- Added _has_adapters_on_both_embeddings() helper method
- Modified _unload_and_optionally_merge() to auto-handle tied embeddings
tests/test_tie_word_embeddings_merge.py:
- Added tests for tie_word_embeddings merge behavior

Test Plan

Tested with Gemma 3 4B model (tie_word_embeddings=True)
Verified merged model produces coherent output
Verified config.tie_word_embeddings is correctly set to False
Verified embed_tokens and lm_head have independent weights after merge
Unit tests added

Example

Before this fix:

# Model with tie_word_embeddings=True + adapters on embed_tokens and lm_head
merged = peft_model.merge_and_unload()
merged.save_pretrained("merged_model")

# Loading produces broken model
loaded = AutoModel.from_pretrained("merged_model")  # lm_head weights lost!

After this fix:

# Same setup
merged = peft_model.merge_and_unload()  # Auto-unties and updates config
merged.save_pretrained("merged_model")

# Loading works correctly
loaded = AutoModel.from_pretrained("merged_model")  # Works as expected

…pted When LoRA is applied to both embed_tokens and lm_head on models with tie_word_embeddings=True, merge_and_unload() now automatically: - Detects if both layers have adapters - Unties the weights before merging - Sets config.tie_word_embeddings=False This prevents the merged lm_head weights from being lost when the model is reloaded. Resolves huggingface#2777

- Check both target_modules and modules_to_save when detecting adapters on embed_tokens and lm_head - Always update config when adapters are on both layers (ModulesToSaveWrapper already unties weights, so we just need to update config) - Update warning message for clarity

romitjain · 2026-01-05T04:39:36Z

@www-spam
I am curious to know if you have tried the flag proposed in the solution of the issue (ensure_weight_tying=True in LoraConfig)
It was added via this PR: #2803 and should be available in the latest release. Let me know if that solved the issue for you.

www-spam · 2026-01-05T20:34:23Z

I tested ensure_weight_tying=True, but it doesn't apply to this case for two main reasons:

1. Technical Limitation: It only applies to modules_to_save

ensure_weight_tying is designed for modules_to_save, not target_modules.

According to lora/config.py:

ensure_weight_tying: bool = field(
    ...
    metadata={
        "help": (
            "...This is only applicable for layers passed via "
            "\`modules_to_save\`."
        )
    },
)

When used with target_modules=["embed_tokens", "lm_head"], it triggers this warning and has no effect:

UserWarning: You have requested `ensure_weight_tying`, but no tied modules are added in `modules_to_save`

2. Conceptual Limitation: Independent training is required

Even if it worked for target_modules, ensure_weight_tying=True forces adapters to share identical weights. This breaks use cases that require independent training, such as:

Custom token learning: New tokens often require different input vs. output embeddings.
Asymmetric fine-tuning: embed_tokens and lm_head may benefit from learning different deltas during optimization.

Conclusion

This PR handles the target_modules case by automatically setting config.tie_word_embeddings=False on merge. This ensures that the distinct weights learned for input and output are correctly preserved upon reload.

romitjain · 2026-01-06T04:20:00Z

@www-spam Re:

Yes, it makes sense. This is being worked upon here: ENH: Tie weights for target_modules in Lora (#2864) #2879. After the merge, ensure_weight_tying should work for target_modules too.
I have a different opinion on this - if the model to be tuned has tied embeddings, ensure_weight_tying = True just makes sure that the model architecture does not break. For both custom token learning and assymetric fine tuning.a better way might be to modify the model to be tuned config to break the tied embedding. Setting config.tie_word_embeddings=False before tuning the model.

I don't think that this should be done as a default by setting config.tie_word_embeddings=False on merge because the downstream tasks might still assume the original model's config.

WDYT?

I think @BenjaminBossan might have some views on this too.

BenjaminBossan · 2026-01-06T11:41:29Z

Thanks for opening the PR @www-spam and your discussion @romitjain

Regarding point 2. I tend to agree with Romit. Implicitly untying the weights here could be surprising for users. As merge_and_unload would be the last step in a possibly very long training process, this could result in a lot of lost time. Yes, we could argue it's the users fault in that case but it's not particularly user friendly. I see that there can be a legitimate need for these use cases but would agree it's better if the user makes this decision ahead of time and explicitly.

What we could ensure on the PEFT side is that if the user targets the tied layers with, say, LoRA, they get a warning that this means that merging and unloading won't work properly. WDYT?

www-spam · 2026-01-18T18:35:58Z

@romitjain @BenjaminBossan

Thanks for pointing to #2879. I reviewed it and I think our PRs address different scenarios.

Different goals:

#2879 extends ensure_weight_tying to target_modules, which keeps LoRA adapters tied together — both embed_tokens and lm_head share the same delta. This is useful when you want to preserve the original tied architecture during fine-tuning.

My use case requires the opposite: independent deltas for embed_tokens and lm_head. When adding domain-specific tokens to the vocabulary, input embeddings need to learn "what this token means" while output embeddings learn "when to generate this token." These diverge during training, and that's intentional.

What actually happened:

I did follow the recommended approach — I set tie_word_embeddings=False before training:

config = AutoConfig.from_pretrained(model_path)
config.tie_word_embeddings = False
model = AutoModelForCausalLM.from_pretrained(model_path, config=config)
# Train with LoRA on embed_tokens and lm_head...

Training worked fine. The problem is after merge_and_unload():

Training with tie_word_embeddings=False — works correctly
merge_and_unload() — completes without error
save_pretrained() — saved config doesn't reflect the training config
from_pretrained() on reload — uses base model's tie_word_embeddings=True
Weights get re-tied, lm_head is overwritten with embed_tokens → model outputs garbage

On the config concern:

I understand the concern about downstream compatibility. But consider this: if someone trained with independent embed_tokens/lm_head, the merged model is architecturally different from the base model. The config should reflect that.

Keeping tie_word_embeddings=True when the weights have actually diverged causes silent model corruption on reload. Updating the config to match the actual model state seems like the safer default.

How these PRs relate:

User intent	Solution
Keep adapters tied during training	#2879 (ensure_weight_tying=True)
Train independent adapters, preserve on merge	This PR

I see them as complementary. Happy to discuss alternative approaches if you have other ideas.

romitjain · 2026-01-19T05:00:08Z

@www-spam Agree on the use case.
I am just adding my thoughts as a PEFT user. I think a better solve might be to preserve the model config when saving the model.

Specifically,

save_pretrained() — saved config doesn't reflect the training config

There should be an implementation to save the config that was provided as per the user. In case the user provides model.config.tie_word_embeddings = False, the final saved config should also reflect that.
This way, the downstream tasks will be as expected and in accordance with your suggestions.

I think you are already doing this in your implementation. IMO, what should not happen is that if the user adds adapters to both of the tied layers, then it automatically breaks the tying (irrespective of whether the user added ensure_weight_tying = True).

github-actions · 2026-02-12T15:18:49Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2026-02-13T11:00:05Z

Sorry for the late reply, I was out of office.

I would prefer not to automatically untie the embeddings. This may not be something the user intended. Instead, how about:

When we detect that weights are tied but targeted with LoRA/modules_to_save/trainable_token_indices, we give a warning about this. We tell the user they should configure the PEFT model differently or explicitly untie the embeddings (make the method from this PR public).

This way, they can make a conscious decision about what they actually would like to achieve.

While doing some testing, I also noticed that the error occurs even without reloading the model. The unloaded model is already incorrect:

import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model


def test_tied_weights_targeted_with_lora():
    model_id = "hf-internal-testing/tiny-random-Gemma3ForCausalLM"

    model = AutoModelForCausalLM.from_pretrained(model_id)
    orig_emb = model.get_input_embeddings().weight.data.clone()
    orig_lm_head = model.get_output_embeddings().weight.data.clone()

    # sanity check
    assert model.config.tie_word_embeddings
    model.config.tie_word_embeddings = False  # doesn't help

    config = LoraConfig(target_modules=["embed_tokens", "lm_head"], init_lora_weights=False)
    model = get_peft_model(model, config)
    dw_emb = model.base_model.model.model.embed_tokens.get_delta_weight("default")
    dw_lm_head = model.base_model.model.lm_head.get_delta_weight("default")

    # sanity check: not zero, not identical
    tol = 1e-6
    assert not torch.allclose(dw_emb, torch.zeros_like(dw_emb), atol=tol, rtol=tol)
    assert not torch.allclose(dw_lm_head, torch.zeros_like(dw_lm_head), atol=tol, rtol=tol)
    assert not torch.allclose(dw_emb, dw_lm_head, atol=tol, rtol=tol)

    unloaded = model.merge_and_unload()
    unloaded_emb = unloaded.get_input_embeddings().weight.data.clone()
    unloaded_lm_head = unloaded.get_output_embeddings().weight.data.clone()

    # ensure that corresponding weights match
    assert torch.allclose(orig_emb, orig_lm_head, atol=tol, rtol=tol)
    # the next two fail, as dw_emb + dw_lm_head are both added to the emb and lm_head
    assert torch.allclose(orig_emb + dw_emb, unloaded_emb, atol=tol, rtol=tol)
    assert torch.allclose(orig_lm_head + dw_lm_head, unloaded_lm_head, atol=tol, rtol=tol)

www-spam closed this Dec 31, 2025

www-spam reopened this Dec 31, 2025

www-spam added 2 commits December 31, 2025 18:36

style: convert test to pytest style (fix ruff PT009)

47ea05d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-untie word embeddings on merge_and_unload when both are adapted#2972

fix: auto-untie word embeddings on merge_and_unload when both are adapted#2972
www-spam wants to merge 3 commits intohuggingface:mainfrom
www-spam:fix/auto-untie-embeddings-on-merge

www-spam commented Dec 31, 2025

Uh oh!

romitjain commented Jan 5, 2026

Uh oh!

www-spam commented Jan 5, 2026

Uh oh!

romitjain commented Jan 6, 2026

Uh oh!

BenjaminBossan commented Jan 6, 2026

Uh oh!

www-spam commented Jan 18, 2026

Uh oh!

romitjain commented Jan 19, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

BenjaminBossan commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

www-spam commented Dec 31, 2025

Summary

Problem

Solution

Changes

Test Plan

Example

Uh oh!

romitjain commented Jan 5, 2026

Uh oh!

www-spam commented Jan 5, 2026

Uh oh!

romitjain commented Jan 6, 2026

Uh oh!

BenjaminBossan commented Jan 6, 2026

Uh oh!

www-spam commented Jan 18, 2026

Uh oh!

romitjain commented Jan 19, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

BenjaminBossan commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants