orthogonal lora layer init #2389

winglian · 2025-02-20T02:50:10Z

see: https://datta0.github.io/posts/rethink-lora-init/

BenjaminBossan

Thanks a lot for adding an option for orthogonal initialization of LoRA weights.

Note that the OLoRA initialization is also aimed at orthogonal initialization. Maybe it would be worth it to compare the two. A disadvantage of OLoRA is, however, that the base weights are also modified, which requires users to take some extra steps if they want to load the model with other LoRA adapters, for instance. Pinging @tokenizer-decode just in case they wanna check this PR.

Before merging, we would also need some more additions to this PR:

Update the docstring of LoraConfig, similar to the help. How about also adding a link to the blog post (AFAICT there is no paper?).
Add a unit test. Check out the tests in this test class.
Let's run make style to satisfy the linter.

BenjaminBossan · 2025-02-20T10:27:49Z

src/peft/tuners/lora/layer.py

+                    X = torch.randn(rank, rank)
+                    Q, _ = torch.linalg.qr(X)
+                    set1 = Q[0::2,:]  # Odd rows
+                    set2 = Q[1::2,:]  # Even rows


r needs to be even for this to work, right? Let's check it and raise an error with a helpful message if it's not.

buyukakyuz · 2025-02-20T11:37:33Z

This is just OLoRA but starting from random weights. How can starting from random weights, rather than getting that information from pretrained weights, converge faster? Did you actually run tests? Because in our research, and every other subsequent research showed that OLoRA and other derivatives like PISSA etc. perform better than any random initialization. For a list of studies see.

buyukakyuz · 2025-02-20T11:41:51Z

src/peft/tuners/lora/config.py

                "nonnegative integer. "
                "Passing `'corda'` results in CorDA initialization. "
                "Pass `'loftq'` to use LoftQ initialization."
+                "Pass `'orthogonal'` to use orthogonal initialization."


I think this is confusing to the user.

BenjaminBossan · 2025-02-20T11:50:51Z

@tokenizer-decode Thanks for commenting. It would indeed by nice to see a comparison with OLoRA or PiSSA, which the linked blog post didn't test. I could see an argument for the proposed initialization method being easier to use, as the base weights are unchanged, so even if it's not as good, there could be some value. WDYT?

buyukakyuz · 2025-02-20T12:06:44Z

I honestly don't see the performance benefit. But if you think there is an ease of use benefit, there could be some value.

This goes for every other decomposition method, SVD e.g.. If the value is not updating the base weights, we can always let the user use the method with a parameter like no_update and we would turn off the part where we update the base weights.

But I might add, for future readers who are confused, updating base weights is generally where you get the performance.

winglian · 2025-02-21T03:38:00Z

here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step. orthogonal outperforms dora too.

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

EDIT: something like

class InitLoraWeights(Protocol):
    def __call__(self, layer, adapter_name) -> None:
        pass

and the Config typing would look something like:

bool | Literal[...] | InitLoraWeights

BenjaminBossan · 2025-02-21T10:10:44Z

here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step.

Thanks for running the tests 🎉 Is the script open so that we can check what might be going on with OLoRA?

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

In general, we would like to avoid this, even though it could be practical. The reason is that we wouldn't be able to serialize the LoraConfig into JSON with values that are Python code.

In sum, I think we can still proceed with the orthogonal weight initialization method. As I mentioned, even if it did not outperform OLoRA or similar methods, it could still be valuable as a more user friendly option.

github-actions · 2025-03-22T15:03:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2025-03-24T10:22:14Z

@winglian Do you have time to finish the PR? If not, let us know so that one of us can take over.

Continuation of, and supersedes huggingface#2389 Check discussion there for further info.

BenjaminBossan · 2025-04-15T14:04:09Z

@winglian I finished up the PR in #2498, would be grateful if you could take a look. Of course, I would add you as a co-author (we could add @datta0 as well).

github-actions · 2025-05-09T15:04:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Continuation of, and supersedes, #2389 Check discussion there for further info. --------- Co-authored-by: Wing Lian <[email protected]>

BenjaminBossan · 2025-06-16T16:49:10Z

@winglian I merged #2498, which supersedes this PR, so I'm closing it now. I added you as co-author.

Continuation of, and supersedes, huggingface#2389 Check discussion there for further info. --------- Co-authored-by: Wing Lian <[email protected]>

* Suppress warning for estimating tokens in trainer * Suppress warning for estimating FLOPs in ORPO and Reward trainers

orthogonal lora layer init

6c8fdbe

BenjaminBossan requested changes Feb 20, 2025

View reviewed changes

buyukakyuz reviewed Feb 20, 2025

View reviewed changes

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Apr 15, 2025

ENH Orthogonal LoRA layer initialization (2)

4a59a80

Continuation of, and supersedes huggingface#2389 Check discussion there for further info.

BenjaminBossan mentioned this pull request Apr 15, 2025

ENH: Orthogonal LoRA layer initialization (2) #2498

Merged

githubnemo added the wip label May 12, 2025

BenjaminBossan added a commit that referenced this pull request Jun 16, 2025

ENH Orthogonal LoRA layer initialization (2) (#2498)

a27406c

Continuation of, and supersedes, #2389 Check discussion there for further info. --------- Co-authored-by: Wing Lian <[email protected]>

BenjaminBossan closed this Jun 16, 2025

cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025

🙈 Suppress warning for estimating tokens in trainers (huggingface#2389)

163695e

* Suppress warning for estimating tokens in trainer * Suppress warning for estimating FLOPs in ORPO and Reward trainers

orthogonal lora layer init #2389

orthogonal lora layer init #2389

Uh oh!

Conversation

winglian commented Feb 20, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

buyukakyuz commented Feb 20, 2025

Uh oh!

buyukakyuz Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Feb 20, 2025

Uh oh!

buyukakyuz commented Feb 20, 2025

Uh oh!

winglian commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Feb 21, 2025

Uh oh!

github-actions bot commented Mar 22, 2025

Uh oh!

BenjaminBossan commented Mar 24, 2025

Uh oh!

BenjaminBossan commented Apr 15, 2025

Uh oh!

github-actions bot commented May 9, 2025

Uh oh!

BenjaminBossan commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

winglian commented Feb 21, 2025 •

edited

Loading