-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
This request proposes integrating DeLoRA (Decoupled Low-rank Adaptation), as described in the ICLR25 accepted paper
paper: https://arxiv.org/abs/2503.18225
code: https://github.com/ExplainableML/DeLoRA
Motivation
DeLoRA tackles finetuning in a Frobenius-norm bounded setup: this allows to prevent divergence from the pretrained model, effectively decoupling the learning of angles and magnitudes.
This is done by
- normalization of the BA low-rank matrices, which bound the updates' Frobenius norm
- (learnable) scaling λ, which controls the update's boundary/magnitude
- layer-wise scaling of ||W||, to adapt each update's norm to the original weights' norm (mimicking multiplicative finetuning).

The method might feel quite similar to DoRA (given the similar target of decoupling angular from magnitude learning), however it presents key differences:
- DoRA applies normalization and scaling operations on the fully finetuned weights (
$W + \Delta W$ ) - the normalization operation is performed on the column space of the weight matrices
Conversely DeLoRA
- introduces the normalization and scaling operations directly on the weight updates
$\Delta W$ (more effectively preventing divergence from the pretrained model) - normalizes the inner low-dimensional space, which implicitly enforces a Frobenius-norm boundary. While, in theory, setting the scaling parameter λ as learnable does not prevent divergence, this does not happen in practice (see figure below, showing performance and norms at the varying of the learning rate)

As a result, DeLoRA is able to achieve better decoupling and, consequently, robustness. In addition, one can arbitrarily initialize the parameter λ to achieve countless norm-bounded variations
Your Contribution
The implementation in https://github.com/ExplainableML/DeLoRA is based on peft, and we would be pleased submit a pull request, welcoming any suggestions or guidance on this.