Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

gapsong · 2025-06-04T12:41:50Z

This pull request introduces QALoRA (Quantization-Aware Low-Rank Adaptation), a new fine-tuning technique for quantized large language models, along with its implementation in the PEFT library. The changes include updates to documentation, configuration, and core logic to support QALoRA's memory-efficient and performance-preserving features.

Documentation Updates

examples/qalora_finetuning/README.md: Added detailed documentation for QALoRA, including its introduction, implementation details, usage examples, command-line instructions, and comparison with other techniques like LoRA and DoRA.

Configuration Enhancements

src/peft/tuners/lora/config.py: Introduced two new configuration parameters: use_qalora to enable QALoRA and qalora_group_size to control the pooling group size for memory-performance tradeoffs.

Core Logic for QALoRA

src/peft/tuners/lora/gptq.py: Updated the GPTQ LoRA implementation to support QALoRA, including logic for resolving QALoRA variants and passing group size parameters. [1] [2] [3]
src/peft/tuners/lora/layer.py: Enhanced the layer update logic to initialize QALoRA-specific parameters and handle adapter-specific configurations. [1] [2]
src/peft/tuners/lora/model.py: Incorporated QALoRA-specific parameters into the model creation and replacement process.

QALoRA Variant Implementation

src/peft/tuners/lora/variants.py: Added the QALoraLinearVariant class, implementing QALoRA-specific logic for initialization, delta weight computation, merging, unmerging, and forward propagation. This includes pooling input features and scaling them for efficient adaptation.

…raLayer and GPTQLoraLinear

BenjaminBossan

Thanks a lot for picking up this old request. Really well done to use the newly added LoRA variant abstraction to implement this.

I checked the PR but haven't done an in-depth review yet. The reason for that is that LoRA variant support has only been added to vanilla LoRA layers (i.e. the layers defined in lora/layers.py). The quantized layers, including GPTQ, don't have any code that would take LoRA variants into account. Therefore, as is, the GPTQ layer would still use the normal forward call and not QALoraLinearVariant.forward. Even worse, the GPTQ layer does not support merging and unmerging, so all of that code QALoraLinearVariant is dead code. So unless I'm missing something, there is still some work required:

Update GPTQLoraLinear.forward to account for LoRA variants (should be easy).
Implement merging and unmerging for GPTQLoraLinear (could be difficult, it depends), or scrap it for now.

To avoid 2., QA LoRA could be implemented for another quantization method that already supports merging and unmerging, like bitsandbytes, but even there, LoRA variant support has yet to be added. Also, I'm not sure how specific your code is to GPTQ.

Anyway, it's a really nice PR and I'd be happy to see it merged. LMK what you think.

… unsupported operations.

gapsong · 2025-06-09T10:57:28Z

Hi @BenjaminBossan ,

Thank you for your review and the helpful feedback!

I've addressed the main points you raised:

I have scraped the merge/unmerge logic completely from QALoraLinearVariant for the GPTQLoraLinear context, as you correctly pointed out it's not supported there and was effectively dead code.
Thank you especially for pointing out that the QALoRA variant's forward function was never actually being called within GPTQLoraLinear. This was a critical oversight, and I've now updated GPTQLoraLinear.forward to correctly dispatch to the QALoRA variant's forward pass when QALoRA is active.

Ready for another look when you have a moment!

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) in Lo…

edc88cb

…raLayer and GPTQLoraLinear

gapsong mentioned this pull request Jun 4, 2025

QaLora #986

Closed

BenjaminBossan reviewed Jun 5, 2025

View reviewed changes

gapsong added 2 commits June 9, 2025 12:54

Refactor QALoraLinearVariant methods to raise NotImplementedError for…

c5f4bf9

… unsupported operations.

Fix dtype handling and use foward in GPTQLoraLinear forward method

af27857

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

gapsong commented Jun 4, 2025

Uh oh!

BenjaminBossan left a comment

Uh oh!

gapsong commented Jun 9, 2025

Uh oh!

Uh oh!

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

Are you sure you want to change the base?

Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571

Conversation

gapsong commented Jun 4, 2025

Documentation Updates

Configuration Enhancements

Core Logic for QALoRA

QALoRA Variant Implementation

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

gapsong commented Jun 9, 2025

Uh oh!

Uh oh!