-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix issue #2786: Store xlora scaling and fix per token normalization #2793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks a lot for creating this PR to fix the issues you identified. At a glance, the changes look good. Ideally, we should also have unit tests to check for these bugs. I think it shouldn't be too hard to add them by extending PS: The failing CI is unrelated and can be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍!
@Che-Xu Do you still plan to work on this? |
Hi @BenjaminBossan, Thank you for the reminder and my sincere apologies for the delayed response. I've been occupied with other projects over the past two weeks. I'm still committed to this and will submit the unit tests within the next two days. I understand the importance of completing this and appreciate you checking in. Again, sorry for the delay and thank you for your patience. I'll prioritize this task and keep you updated. |
@Che-Xu No worries, take the time you need. My ping was just a reminder, as sometimes people forget or miss notifications. Feel free to let me know if you have any questions. |
Thanks for the feedback! I've extended
The tests have been added to the existing PR. Please let me know if you'd like me to adjust anything in the test coverage! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the unit tests. The test_scalings_storage
looks good but I believe the test_per_token_normalization_with_softmax_topk
test can be greatly simplified if we focus on the main aspect that we want to test. Please check my proposal.
tests/test_xlora.py
Outdated
if normalized_scalings is None: | ||
assert normalized_scalings is not None, ( | ||
f"Missing normalized_scalings in layer {data['layer']} {data['projection']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if
check can be removed, right?
tests/test_xlora.py
Outdated
continue | ||
|
||
if hasattr(normalized_scalings, "cpu"): | ||
scalings_np = normalized_scalings.cpu().detach().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to move the array to numpy?
tests/test_xlora.py
Outdated
for t in range(seq_len): | ||
weights = scalings_np[b, t, :] | ||
weight_sum = weights.sum() | ||
assert np.isclose(weight_sum, 1.0, atol=1e-5), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the essential part of the test. I would focus on this assert, no need to check the other stuff and also no need to report the layer, batch, and token in detail.
tests/test_xlora.py
Outdated
assert torch.isfinite(latest_scalings).all(), "Scalings should contain finite values" | ||
|
||
def test_per_token_normalization_with_softmax_topk(self, tokenizer, model): | ||
captured_data = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole test can be greatly simplified if we are content with only logging the scalings. I think that should be enough, as the other logged data is just needed for a nicer error message and I believe we can do without that.
Here is my proposal:
from peft.tuners.xlora.layer import XLoraLayer
...
def test_per_token_normalization_with_softmax_topk(self, tokenizer, model, monkeypatch):
orig_get_maybe_topk_scalings = XLoraLayer.get_maybe_topk_scalings
captured_data = []
def mock_get_maybe_topk_scalings(*args, **kwargs):
result = orig_get_maybe_topk_scalings(*args, **kwargs)
captured_data.append(result)
return result
monkeypatch.setattr(XLoraLayer, "get_maybe_topk_scalings", mock_get_maybe_topk_scalings)
model.enable_scalings_logging()
inputs = tokenizer.encode("Test per token normalization", add_special_tokens=False, return_tensors="pt")
outputs = model.generate(
input_ids=inputs.to(self.torch_device),
max_new_tokens=1,
)
for scaling in captured_data:
assert ...
Thank you for your suggestion! The revised version is much cleaner and clearer, and I have learned a lot from your approach. One small addition I made is that, since XLoRA performs two forward passes (a dummy pass and a real pass), and we only want to capture the scalings from the real pass, I included the check I have already made these changes in the existing PR. Thank you again for your guidance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for identifying and fixing these two issues with X-LoRA, the changes LGTM. Failing tests are unrelated.
Resolves #2786
Description
This PR addresses two issues identified while using X-LoRA with the Qwen2-VL-7B model.
Issue 1: Internal Scaling Storage Problem
Location:
_enable_peft_forward_hooks()
insrc/peft/tuners/xlora/model.py
Problem: After calling
enable_scalings_logging()
, the subsequent call toget_latest_scalings()
returnedNone
because computedxlora_scalings
were not properly stored for later retrieval.Issue 2: Incorrect Probability Normalization
Location:
get_maybe_topk_scalings()
insrc/peft/tuners/xlora/layer.py
Problem: The current implementation incorrectly normalized expert probabilities such that the sum over all tokens was 1, rather than summing to 1 per token.
Implementation
_enable_peft_forward_hooks()
get_maybe_topk_scalings()