Fix issue #2786: Store xlora scaling and fix per token normalization #2793

Che-Xu · 2025-09-21T12:56:46Z

Resolves #2786

Description

This PR addresses two issues identified while using X-LoRA with the Qwen2-VL-7B model.

Issue 1: Internal Scaling Storage Problem

Location: _enable_peft_forward_hooks() in src/peft/tuners/xlora/model.py
Problem: After calling enable_scalings_logging(), the subsequent call to get_latest_scalings() returned None because computed xlora_scalings were not properly stored for later retrieval.

Issue 2: Incorrect Probability Normalization

Location: get_maybe_topk_scalings() in src/peft/tuners/xlora/layer.py
Problem: The current implementation incorrectly normalized expert probabilities such that the sum over all tokens was 1, rather than summing to 1 per token.

Implementation

src/peft/tuners/xlora/model.py: Added storage of computed scalings in _enable_peft_forward_hooks()

xlora_scalings = self.internal_xlora_classifier(result=base_output, *args_real, **kwargs_real)
# Store computed scalings to fix get_latest_scalings() returning None
self.internal_xlora_scalings = xlora_scalings

src/peft/tuners/xlora/layer.py: Fixed normalization logic in get_maybe_topk_scalings()

# Apply per-token normalization to the xLoRA scaling factors using a softmax
if self.config.enable_softmax_topk:
    nonzero_mask = xlora_scalings != 0
    full = xlora_scalings.masked_fill(~nonzero_mask, float("-inf"))
    new_scalings = torch.softmax(full, dim=-1)
    xlora_scalings = new_scalings.masked_fill(~nonzero_mask, 0.0)

HuggingFaceDocBuilderDev · 2025-09-21T13:27:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2025-09-23T14:20:45Z

Thanks a lot for creating this PR to fix the issues you identified. At a glance, the changes look good. Ideally, we should also have unit tests to check for these bugs. I think it shouldn't be too hard to add them by extending test_xlora.py. Would you be interested in adding these tests @Che-Xu?

PS: The failing CI is unrelated and can be ignored.

EricLBuehler

Looks great 👍!

BenjaminBossan · 2025-10-06T10:02:15Z

@Che-Xu Do you still plan to work on this?

Che-Xu · 2025-10-06T12:58:18Z

Hi @BenjaminBossan,

Thank you for the reminder and my sincere apologies for the delayed response. I've been occupied with other projects over the past two weeks.

I'm still committed to this and will submit the unit tests within the next two days. I understand the importance of completing this and appreciate you checking in.

Again, sorry for the delay and thank you for your patience. I'll prioritize this task and keep you updated.

BenjaminBossan · 2025-10-06T13:31:20Z

@Che-Xu No worries, take the time you need. My ping was just a reminder, as sometimes people forget or miss notifications. Feel free to let me know if you have any questions.

Che-Xu · 2025-10-08T12:12:10Z

@BenjaminBossan

Thanks for the feedback! I've extended test_xlora.py with two new unit tests:

test_scalings_storage()

Verifies that scaling values are properly stored after generation
Validates that get_latest_scalings() returns non-None torch.Tensors
Ensures all scaling values are finite and numerically stable

test_per_token_normalization_with_softmax_topk()

Implements a testing approach using monkey-patching to hook into the forward pass of XLoraLinear layers
Captures scaling values during the second forward pass and validates the get_maybe_topk_scalings() function behavior
Validates both the existence and mathematical correctness of normalized scalings
Includes shape validation, batch processing checks, and numerical stability assertions

The tests have been added to the existing PR. Please let me know if you'd like me to adjust anything in the test coverage!

BenjaminBossan

Thanks for adding the unit tests. The test_scalings_storage looks good but I believe the test_per_token_normalization_with_softmax_topk test can be greatly simplified if we focus on the main aspect that we want to test. Please check my proposal.

BenjaminBossan · 2025-10-08T14:39:10Z

tests/test_xlora.py

+                if normalized_scalings is None:
+                    assert normalized_scalings is not None, (
+                        f"Missing normalized_scalings in layer {data['layer']} {data['projection']}"


The if check can be removed, right?

BenjaminBossan · 2025-10-08T14:39:52Z

tests/test_xlora.py

+                    continue
+
+                if hasattr(normalized_scalings, "cpu"):
+                    scalings_np = normalized_scalings.cpu().detach().numpy()


Why is it necessary to move the array to numpy?

BenjaminBossan · 2025-10-08T14:40:56Z

tests/test_xlora.py

+                    for t in range(seq_len):
+                        weights = scalings_np[b, t, :]
+                        weight_sum = weights.sum()
+                        assert np.isclose(weight_sum, 1.0, atol=1e-5), (


I think this is the essential part of the test. I would focus on this assert, no need to check the other stuff and also no need to report the layer, batch, and token in detail.

BenjaminBossan · 2025-10-08T14:46:18Z

tests/test_xlora.py

+        assert torch.isfinite(latest_scalings).all(), "Scalings should contain finite values"
+
+    def test_per_token_normalization_with_softmax_topk(self, tokenizer, model):
+        captured_data = []


I think the whole test can be greatly simplified if we are content with only logging the scalings. I think that should be enough, as the other logged data is just needed for a nicer error message and I believe we can do without that.

Here is my proposal:

from peft.tuners.xlora.layer import XLoraLayer ... def test_per_token_normalization_with_softmax_topk(self, tokenizer, model, monkeypatch): orig_get_maybe_topk_scalings = XLoraLayer.get_maybe_topk_scalings captured_data = [] def mock_get_maybe_topk_scalings(*args, **kwargs): result = orig_get_maybe_topk_scalings(*args, **kwargs) captured_data.append(result) return result monkeypatch.setattr(XLoraLayer, "get_maybe_topk_scalings", mock_get_maybe_topk_scalings) model.enable_scalings_logging() inputs = tokenizer.encode("Test per token normalization", add_special_tokens=False, return_tensors="pt") outputs = model.generate( input_ids=inputs.to(self.torch_device), max_new_tokens=1, ) for scaling in captured_data: assert ...

Che-Xu · 2025-10-09T03:58:34Z

@BenjaminBossan,

Thank you for your suggestion! The revised version is much cleaner and clearer, and I have learned a lot from your approach.

One small addition I made is that, since XLoRA performs two forward passes (a dummy pass and a real pass), and we only want to capture the scalings from the real pass, I included the check if getattr(model, "internal_xlora_scalings", None) is not None: in the mock function. This ensures we only record the normalized scalings that are actually used in the real forward pass.

I have already made these changes in the existing PR. Thank you again for your guidance!

BenjaminBossan

Thanks for identifying and fixing these two issues with X-LoRA, the changes LGTM. Failing tests are unrelated.

BenjaminBossan requested a review from EricLBuehler September 23, 2025 14:20

EricLBuehler approved these changes Sep 23, 2025

View reviewed changes

Che-Xu added 3 commits October 8, 2025 20:32

Store xlora scaling and fix per token normalization

4bde402

Store xlora scaling and fix per token normalization

af982fe

Add unit tests to check for the bugs

95ca0b8

Che-Xu force-pushed the fix-issue-2786 branch from 0d5fcd9 to 95ca0b8 Compare October 8, 2025 12:35

BenjaminBossan requested changes Oct 8, 2025

View reviewed changes

Simplify the test_per_token_normalization_with_softmax_topk

ae5167f

BenjaminBossan approved these changes Oct 9, 2025

View reviewed changes

BenjaminBossan merged commit e9f5707 into huggingface:main Oct 9, 2025
5 of 13 checks passed

Fix issue #2786: Store xlora scaling and fix per token normalization #2793

Fix issue #2786: Store xlora scaling and fix per token normalization #2793

Uh oh!

Conversation

Che-Xu commented Sep 21, 2025

Description

Issue 1: Internal Scaling Storage Problem

Issue 2: Incorrect Probability Normalization

Implementation

Uh oh!

HuggingFaceDocBuilderDev commented Sep 21, 2025

Uh oh!

BenjaminBossan commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EricLBuehler left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Oct 6, 2025

Uh oh!

Che-Xu commented Oct 6, 2025

Uh oh!

BenjaminBossan commented Oct 6, 2025

Uh oh!

Che-Xu commented Oct 8, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Che-Xu commented Oct 9, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BenjaminBossan commented Sep 23, 2025 •

edited

Loading