Add support for Quantization-Aware Low-Rank Adaptation (QALoRA) #2571
+854
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces QALoRA (Quantization-Aware Low-Rank Adaptation), a new fine-tuning technique for quantized large language models, along with its implementation in the PEFT library. The changes include updates to documentation, configuration, and core logic to support QALoRA's memory-efficient and performance-preserving features.
Documentation Updates
examples/qalora_finetuning/README.md
: Added detailed documentation for QALoRA, including its introduction, implementation details, usage examples, command-line instructions, and comparison with other techniques like LoRA and DoRA.Configuration Enhancements
src/peft/tuners/lora/config.py
: Introduced two new configuration parameters:use_qalora
to enable QALoRA andqalora_group_size
to control the pooling group size for memory-performance tradeoffs.Core Logic for QALoRA
src/peft/tuners/lora/gptq.py
: Updated the GPTQ LoRA implementation to support QALoRA, including logic for resolving QALoRA variants and passing group size parameters. [1] [2] [3]src/peft/tuners/lora/layer.py
: Enhanced the layer update logic to initialize QALoRA-specific parameters and handle adapter-specific configurations. [1] [2]src/peft/tuners/lora/model.py
: Incorporated QALoRA-specific parameters into the model creation and replacement process.QALoRA Variant Implementation
src/peft/tuners/lora/variants.py
: Added theQALoraLinearVariant
class, implementing QALoRA-specific logic for initialization, delta weight computation, merging, unmerging, and forward propagation. This includes pooling input features and scaling them for efficient adaptation.