Skip to content

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

gapsong
Copy link

@gapsong gapsong commented Jun 4, 2025

This pull request introduces QALoRA (Quantization-Aware Low-Rank Adaptation), a new fine-tuning technique for quantized large language models, along with its implementation in the PEFT library. The changes include updates to documentation, configuration, and core logic to support QALoRA's memory-efficient and performance-preserving features.

Documentation Updates

  • examples/qalora_finetuning/README.md: Added detailed documentation for QALoRA, including its introduction, implementation details, usage examples, command-line instructions, and comparison with other techniques like LoRA and DoRA.

Configuration Enhancements

  • src/peft/tuners/lora/config.py: Introduced two new configuration parameters: use_qalora to enable QALoRA and qalora_group_size to control the pooling group size for memory-performance tradeoffs.

Core Logic for QALoRA

QALoRA Variant Implementation

  • src/peft/tuners/lora/variants.py: Added the QALoraLinearVariant class, implementing QALoRA-specific logic for initialization, delta weight computation, merging, unmerging, and forward propagation. This includes pooling input features and scaling them for efficient adaptation.

@gapsong gapsong mentioned this pull request Jun 4, 2025
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for picking up this old request. Really well done to use the newly added LoRA variant abstraction to implement this.

I checked the PR but haven't done an in-depth review yet. The reason for that is that LoRA variant support has only been added to vanilla LoRA layers (i.e. the layers defined in lora/layers.py). The quantized layers, including GPTQ, don't have any code that would take LoRA variants into account. Therefore, as is, the GPTQ layer would still use the normal forward call and not QALoraLinearVariant.forward. Even worse, the GPTQ layer does not support merging and unmerging, so all of that code QALoraLinearVariant is dead code. So unless I'm missing something, there is still some work required:

  1. Update GPTQLoraLinear.forward to account for LoRA variants (should be easy).
  2. Implement merging and unmerging for GPTQLoraLinear (could be difficult, it depends), or scrap it for now.

To avoid 2., QA LoRA could be implemented for another quantization method that already supports merging and unmerging, like bitsandbytes, but even there, LoRA variant support has yet to be added. Also, I'm not sure how specific your code is to GPTQ.

Anyway, it's a really nice PR and I'd be happy to see it merged. LMK what you think.

@gapsong
Copy link
Author

gapsong commented Jun 9, 2025

Hi @BenjaminBossan ,

Thank you for your review and the helpful feedback!

I've addressed the main points you raised:

  • I have scraped the merge/unmerge logic completely from QALoraLinearVariant for the GPTQLoraLinear context, as you correctly pointed out it's not supported there and was effectively dead code.
  • Thank you especially for pointing out that the QALoRA variant's forward function was never actually being called within GPTQLoraLinear. This was a critical oversight, and I've now updated GPTQLoraLinear.forward to correctly dispatch to the QALoRA variant's forward pass when QALoRA is active.

Ready for another look when you have a moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants