FEAT Allow LoRA to target nn.Parameter #2638

BenjaminBossan · 2025-07-09T15:16:13Z

Normally, nn.Parameter cannot be targeted with LoRA adapters. This can be problematic, e.g. when there are MoE layers that use nn.Parameter directly, or when there is nn.Linear but the weight is passed directly instead of calling forward (e.g. MHA).

It would be possible to craft a solution involving a special LoRA layer for each of the modules that use nn.Parameter directly (e.g. lora.MHA) but that doesn't scale. This PR is an attempt at implementing a direct way to target nn.Parameter making use of torch.nn.utils.parametrize.

Normally, nn.Parameter cannot be targeted with LoRA adapters. This can be problematic, e.g. when there are MoE layers that use nn.Parameter directly, or when there is nn.Linear but the weight is passed directly instead of calling forward (e.g. MHA). It would be possible to craft a solution involving a special LoRA layer for each of the modules that use nn.Parameter directly (e.g. lora.MHA) but that doesn't scale. This PR is an attempt at implementing a direct way to target nn.Parameter making use of torch.nn.parametrize. The current state of the PR is WIP, the next step is to add a dispatching mechanism. This is not trivial, as we don't want the new changes to accidentally affect the current matching logic. Probably the best way is to add a completely new config variable (e.g. target_parameters) that does not interfere with the current code.

HuggingFaceDocBuilderDev · 2025-07-09T15:20:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Should now generally work.

src/peft/tuners/lora/layer.py

githubnemo

Super cool feature :) Implementation looks good as well, some questions/remarks in the comments

docs/source/developer_guides/lora.md

src/peft/tuners/lora/layer.py

src/peft/tuners/tuners_utils.py

src/peft/utils/save_and_load.py

tests/test_target_parameters.py

Co-authored-by: githubnemo <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

The test tests/test_target_parameters.py::TestDecoderModelsTargetParameters::test_merge_layers[LoraConfig-config_kwargs2-trl-internal-testing/tiny-Llama4ForCausalLM] is failing for me locally with somewhat large numerical differences in the expected outputs. I'm not quite sure why that is. This test involves a mixture of normal LoRA and LoRA targeting nn.Parameter and then merging. Possibly this is a bug or possibly this just requires higher tolerance, I'll investigate later.

BenjaminBossan

Thanks for the reviews @githubnemo and @qgallouedec. The comments should be addressed. I added another test config involving a mixture of normal LoRA and LoRA targeting nn.Parameter and then merging. This results in one test,

tests/test_target_parameters.py::TestDecoderModelsTargetParameters::test_merge_layers[LoraConfig-config_kwargs2-trl-internal-testing/tiny-Llama4ForCausalLM]

to fail for me locally with somewhat large numerical differences in the expected outputs. I'm not quite sure why that is. Possibly this is a bug or possibly this just requires higher tolerance, I'll investigate later.

src/peft/tuners/lora/layer.py

src/peft/tuners/tuners_utils.py

BenjaminBossan · 2025-07-11T19:16:49Z

src/peft/tuners/tuners_utils.py

            if any(p.device == meta for p in adapter_layer.parameters()):
                continue

+            # TODO: weight is not necessarily defined here, leading to a NameError, fix that


Note: This is an existing bug and has nothing to do with the PR, just flagging it here.

src/peft/tuners/tuners_utils.py

src/peft/utils/save_and_load.py

tests/test_target_parameters.py

githubnemo

LGTM! Thanks for the swift implementation :)

This issue was found in PR huggingface#2638 and is defined thusly: > When calling `get_peft_model_state_dict(..., save_embedding_layers="auto")` we check if the > embedding layer is targetted to determine if the embedding layers need saving. This is not > done when `PeftConfig.target_modules` is a regex-string, potentially missing to save embeddings. This is fixed by adding a check similar to the existing query of whether `EMBEDDING_LAYER_NAMES` is a subset of the defined target modules, only that the regex matching from `BaseTuner.inject_adapter` is used. To avoid code duplication, the matching was moved to its own utility function `match_target_against_key`. The main complication was to define the test-cases as it was non-trivial to find what the meaning of `save_embedding_layers="auto"` entails. I've assembled a list of cases that I think are correct in the corresponding unit test.

When the target_parameters feature for LoRA was introduced in huggingface#2638, there was one gap, namely the possibility to target multiple nn.Parameters on the same module. (There was only a workaroud involving multiple adapters, but that is not user friendly.) With this PR, it is now possible to achieve this. The mechanism to enable this is a bit crude, namely allowing to nest multiple ParamWrappers. This should generally be fine as long as there are only a couple of nn.Parameters being targeted on the same module. When there are dozens or hundreds, this approach could load to slow downs or other issues. A side effect of this implementation is that the ParamWrapper, when it removes the parametrization, now only removes its own parametrization. When using nn.utils.parametrize.remove_parametrization, it removes all parametrizations, which is bad when we have nested parametrizations.

When the target_parameters feature for LoRA was introduced in #2638, there was one gap, namely the possibility to target multiple nn.Parameters on the same module (there was only a workaround involving multiple adapters, but that is not user friendly). With this PR, it is now possible to achieve this. The mechanism to enable this is a bit crude, namely allowing to nest multiple ParamWrappers. This should generally be fine as long as there are only a couple of nn.Parameters being targeted on the same module. When there are dozens or hundreds, this approach could load to slow downs or other issues. A side effect of this implementation is that the ParamWrapper, when it removes the parametrization, now only removes its own parametrization. When using nn.utils.parametrize.remove_parametrization, it removes all parametrizations, which is bad when we have nested parametrizations. Alternative approaches Some alternative approaches were discussed internally but the chosen one was considered most practical. Allow to have more than one adapted parameter per LoRA layer. This would require to have nested dicts for the LoRA parameters, something like self.lora_A[adapter_name][parameter_name]. We don't have this anywhere so far and it would probably break implicit assumptions about PEFT layers in many places (like, parsing of state_dict keys), requiring many adjustments. Have an auxiliary module that contains the individual LoRA layers that target the individual parameters. This could be the cleanest solution and would probably be more efficient if there are a huge number of targeted parameters per module. However, this also brings extra complexity, as it requires implementing the logic of how to route the information to the right parameter, and it may be a solution to a problem that is irrelevant in practice (large number of targets per module).

Normally, nn.Parameter cannot be targeted with LoRA adapters. This can be problematic, e.g. when there are MoE layers that use nn.Parameter directly, or when there is nn.Linear but the weight is passed directly instead of calling forward (e.g. MHA). It would be possible to craft a solution involving a special LoRA layer for each of the modules that use nn.Parameter directly (e.g. lora.MHA) but that doesn't scale. This PR is implements a direct way to target nn.Parameter making use of torch.nn.utils.parametrize. Using the feature requires passing target_parameters to the LoraConfig. During the forward pass, when the parameter is acceessed, the LoRA weights are added to the weights while still ensuring that gradients flow correctly to the LoRA weights. Right now, only LoRA supports this feature. Moreover, it is not possible to target multiple parameters of the same module with the same adapter. A workaround is to use multiple adapters (i.e. with different names). --------- Co-authored-by: githubnemo <[email protected]>

When the target_parameters feature for LoRA was introduced in huggingface#2638, there was one gap, namely the possibility to target multiple nn.Parameters on the same module (there was only a workaround involving multiple adapters, but that is not user friendly). With this PR, it is now possible to achieve this. The mechanism to enable this is a bit crude, namely allowing to nest multiple ParamWrappers. This should generally be fine as long as there are only a couple of nn.Parameters being targeted on the same module. When there are dozens or hundreds, this approach could load to slow downs or other issues. A side effect of this implementation is that the ParamWrapper, when it removes the parametrization, now only removes its own parametrization. When using nn.utils.parametrize.remove_parametrization, it removes all parametrizations, which is bad when we have nested parametrizations. Alternative approaches Some alternative approaches were discussed internally but the chosen one was considered most practical. Allow to have more than one adapted parameter per LoRA layer. This would require to have nested dicts for the LoRA parameters, something like self.lora_A[adapter_name][parameter_name]. We don't have this anywhere so far and it would probably break implicit assumptions about PEFT layers in many places (like, parsing of state_dict keys), requiring many adjustments. Have an auxiliary module that contains the individual LoRA layers that target the individual parameters. This could be the cleanest solution and would probably be more efficient if there are a huge number of targeted parameters per module. However, this also brings extra complexity, as it requires implementing the logic of how to route the information to the right parameter, and it may be a solution to a problem that is irrelevant in practice (large number of targets per module).

This issue was found in PR #2638 and is defined thusly: > When calling `get_peft_model_state_dict(..., save_embedding_layers="auto")` we check if the > embedding layer is targetted to determine if the embedding layers need saving. This is not > done when `PeftConfig.target_modules` is a regex-string, potentially missing to save embeddings. This is fixed by adding a check similar to the existing query of whether `EMBEDDING_LAYER_NAMES` is a subset of the defined target modules, only that the regex matching from `BaseTuner.inject_adapter` is used. To avoid code duplication, the matching was moved to its own utility function `match_target_against_key`. The main complication was to define the test-cases as it was non-trivial to find what the meaning of `save_embedding_layers="auto"` entails. I've assembled a list of cases that I think are correct in the corresponding unit test.

BenjaminBossan added 3 commits July 9, 2025 11:19

Implement dispatching logic, further improvements

a2f9022

Reduce CI test load temporarily

27f27fc

BenjaminBossan added 2 commits July 9, 2025 18:00

Fix for prompt learning

8de29e0

Update

95c1c80

Should now generally work.

BenjaminBossan mentioned this pull request Jul 11, 2025

LoRA for experts layers in MoE #2527

Closed

qgallouedec reviewed Jul 11, 2025

View reviewed changes

src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved

githubnemo reviewed Jul 11, 2025

View reviewed changes

BenjaminBossan and others added 4 commits July 11, 2025 21:28

Apply suggestions from code review

509c38f

Co-authored-by: githubnemo <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

Fix small bug with multi adapter different targets

fe59dc0

Test on all platforms

d464812

BenjaminBossan commented Jul 11, 2025

View reviewed changes

BenjaminBossan mentioned this pull request Jul 14, 2025

GraniteMoE’s implementation is not compatible with HF’s peft #2545

Closed

BenjaminBossan added 2 commits July 14, 2025 17:34

Update PR: reviewer feedback, fixes, tests

f915e57

Merge branch 'main' into feat-lora-for-nn.parameter

d99241c

BenjaminBossan changed the title ~~WIP: FEAT Allow LoRA to target nn.Parameter~~ FEAT Allow LoRA to target nn.Parameter Jul 14, 2025

BenjaminBossan requested a review from githubnemo July 14, 2025 16:11

BenjaminBossan marked this pull request as ready for review July 14, 2025 16:12

BenjaminBossan added 3 commits July 15, 2025 11:25

Add targeted_parameter_names tests, better docs

fe96a6b

Add test for model/layer status

606e16a

Add more tests, warnings

7e025eb

githubnemo approved these changes Jul 15, 2025

View reviewed changes

BenjaminBossan merged commit f3b97c3 into huggingface:main Jul 15, 2025
11 of 14 checks passed

BenjaminBossan deleted the feat-lora-for-nn.parameter branch July 15, 2025 14:18

githubnemo mentioned this pull request Jul 15, 2025

Fix not detecting regex-targeted embedding layer #2649

Merged

BenjaminBossan mentioned this pull request Jul 23, 2025

ENH: Targeting multiple parameters on the same module #2665

Merged

BenjaminBossan mentioned this pull request Jul 29, 2025

ENH: Allow FSDP ignored modules to be regex huggingface/accelerate#3698

Merged

5 tasks

This was referenced Aug 1, 2025

4bit quantization for arbitrary nn.Parameter bitsandbytes-foundation/bitsandbytes#1720

Merged

WIP: Initial support for bnb 4bit on any nn.Parameter huggingface/transformers#39859

Draft

perinmclaughlin mentioned this pull request Aug 9, 2025

[WIP] Fix naive for loops for MoE models resulting in sub 20% downstream MFU for training with trl, e.t.c (Qwen3, Deepseek V3, Ernie 4.5, GLM 4.5, Dots1) huggingface/transformers#40016

Closed

2 tasks

BenjaminBossan mentioned this pull request Aug 18, 2025

X cast #2743

Closed

cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025

🥞 Fix BCO gradient accumulation loss scaling (huggingface#2638)

59c2014

FEAT Allow LoRA to target nn.Parameter #2638

FEAT Allow LoRA to target nn.Parameter #2638

Uh oh!

Conversation

BenjaminBossan commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 9, 2025

Uh oh!

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BenjaminBossan commented Jul 9, 2025 •

edited

Loading