-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: auto-untie word embeddings on merge_and_unload when both are adapted #2972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: auto-untie word embeddings on merge_and_unload when both are adapted #2972
Conversation
…pted When LoRA is applied to both embed_tokens and lm_head on models with tie_word_embeddings=True, merge_and_unload() now automatically: - Detects if both layers have adapters - Unties the weights before merging - Sets config.tie_word_embeddings=False This prevents the merged lm_head weights from being lost when the model is reloaded. Resolves huggingface#2777
- Check both target_modules and modules_to_save when detecting adapters on embed_tokens and lm_head - Always update config when adapters are on both layers (ModulesToSaveWrapper already unties weights, so we just need to update config) - Update warning message for clarity
|
I tested 1. Technical Limitation: It only applies to
According to ensure_weight_tying: bool = field(
...
metadata={
"help": (
"...This is only applicable for layers passed via "
"\`modules_to_save\`."
)
},
)When used with 2. Conceptual Limitation: Independent training is required Even if it worked for
Conclusion This PR handles the |
|
@www-spam Re:
I don't think that this should be done as a default by setting WDYT? I think @BenjaminBossan might have some views on this too. |
|
Thanks for opening the PR @www-spam and your discussion @romitjain Regarding point 2. I tend to agree with Romit. Implicitly untying the weights here could be surprising for users. As What we could ensure on the PEFT side is that if the user targets the tied layers with, say, LoRA, they get a warning that this means that merging and unloading won't work properly. WDYT? |
|
Thanks for pointing to #2879. I reviewed it and I think our PRs address different scenarios. Different goals: #2879 extends ensure_weight_tying to target_modules, which keeps LoRA adapters tied together — both embed_tokens and lm_head share the same delta. This is useful when you want to preserve the original tied architecture during fine-tuning. My use case requires the opposite: independent deltas for embed_tokens and lm_head. When adding domain-specific tokens to the vocabulary, input embeddings need to learn "what this token means" while output embeddings learn "when to generate this token." These diverge during training, and that's intentional. What actually happened: I did follow the recommended approach — I set tie_word_embeddings=False before training: config = AutoConfig.from_pretrained(model_path)
config.tie_word_embeddings = False
model = AutoModelForCausalLM.from_pretrained(model_path, config=config)
# Train with LoRA on embed_tokens and lm_head...Training worked fine. The problem is after merge_and_unload():
On the config concern: I understand the concern about downstream compatibility. But consider this: if someone trained with independent embed_tokens/lm_head, the merged model is architecturally different from the base model. The config should reflect that. Keeping tie_word_embeddings=True when the weights have actually diverged causes silent model corruption on reload. Updating the config to match the actual model state seems like the safer default. How these PRs relate:
I see them as complementary. Happy to discuss alternative approaches if you have other ideas. |
|
@www-spam Agree on the use case. Specifically,
There should be an implementation to save the config that was provided as per the user. In case the user provides I think you are already doing this in your implementation. IMO, what should not happen is that if the user adds adapters to both of the tied layers, then it automatically breaks the tying (irrespective of whether the user added |
Summary
This PR fixes the issue where
merge_and_unload()produces broken models when adapters are applied to bothembed_tokensandlm_headon models withtie_word_embeddings=True.Resolves #2777
Problem
When a base model has
tie_word_embeddings=True(e.g., Gemma, Llama):embed_tokensandlm_headshare the same weight tensormodules_to_saveortarget_modules)merge_and_unload()merges both layers with their respective deltastie_word_embeddings=TrueAutoModel.from_pretrained(), thelm_headweights are overwritten withembed_tokensweights due to weight tyinglm_headweights are lost, causing degraded or garbage outputSolution
This PR modifies
_unload_and_optionally_merge()inBaseTunerto:lm_head.weightbefore mergeconfig.tie_word_embeddings = Falsein all relevant config locationsThis ensures that:
Changes
src/peft/tuners/tuners_utils.py:_untie_embedding_weights()helper method_update_tie_word_embeddings_config()helper method_has_adapters_on_both_embeddings()helper method_unload_and_optionally_merge()to auto-handle tied embeddingstests/test_tie_word_embeddings_merge.py:Test Plan
Example
Before this fix:
After this fix: