Skip to content

Conversation

@winglian
Copy link
Contributor

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding an option for orthogonal initialization of LoRA weights.

Note that the OLoRA initialization is also aimed at orthogonal initialization. Maybe it would be worth it to compare the two. A disadvantage of OLoRA is, however, that the base weights are also modified, which requires users to take some extra steps if they want to load the model with other LoRA adapters, for instance. Pinging @tokenizer-decode just in case they wanna check this PR.

Before merging, we would also need some more additions to this PR:

  • Update the docstring of LoraConfig, similar to the help. How about also adding a link to the blog post (AFAICT there is no paper?).
  • Add a unit test. Check out the tests in this test class.
  • Let's run make style to satisfy the linter.

X = torch.randn(rank, rank)
Q, _ = torch.linalg.qr(X)
set1 = Q[0::2,:] # Odd rows
set2 = Q[1::2,:] # Even rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r needs to be even for this to work, right? Let's check it and raise an error with a helpful message if it's not.

@buyukakyuz
Copy link
Contributor

This is just OLoRA but starting from random weights. How can starting from random weights, rather than getting that information from pretrained weights, converge faster? Did you actually run tests? Because in our research, and every other subsequent research showed that OLoRA and other derivatives like PISSA etc. perform better than any random initialization. For a list of studies see.

"nonnegative integer. "
"Passing `'corda'` results in CorDA initialization. "
"Pass `'loftq'` to use LoftQ initialization."
"Pass `'orthogonal'` to use orthogonal initialization."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing to the user.

@BenjaminBossan
Copy link
Member

@tokenizer-decode Thanks for commenting. It would indeed by nice to see a comparison with OLoRA or PiSSA, which the linked blog post didn't test. I could see an argument for the proposed initialization method being easier to use, as the base weights are unchanged, so even if it's not as good, there could be some value. WDYT?

@buyukakyuz
Copy link
Contributor

I honestly don't see the performance benefit. But if you think there is an ease of use benefit, there could be some value.

This goes for every other decomposition method, SVD e.g.. If the value is not updating the base weights, we can always let the user use the method with a parameter like no_update and we would turn off the part where we update the base weights.

But I might add, for future readers who are confused, updating base weights is generally where you get the performance.

@winglian
Copy link
Contributor Author

winglian commented Feb 21, 2025

Screenshot 2025-02-20 at 10 36 09 PM here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step. orthogonal outperforms dora too.

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

EDIT: something like

class InitLoraWeights(Protocol):
    def __call__(self, layer, adapter_name) -> None:
        pass

and the Config typing would look something like:

bool | Literal[...] | InitLoraWeights

@BenjaminBossan
Copy link
Member

here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step.

Thanks for running the tests 🎉 Is the script open so that we can check what might be going on with OLoRA?

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

In general, we would like to avoid this, even though it could be practical. The reason is that we wouldn't be able to serialize the LoraConfig into JSON with values that are Python code.

In sum, I think we can still proceed with the orthogonal weight initialization method. As I mentioned, even if it did not outperform OLoRA or similar methods, it could still be valuable as a more user friendly option.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@BenjaminBossan
Copy link
Member

@winglian Do you have time to finish the PR? If not, let us know so that one of us can take over.

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Apr 15, 2025
Continuation of, and supersedes huggingface#2389

Check discussion there for further info.
@BenjaminBossan
Copy link
Member

@winglian I finished up the PR in #2498, would be grateful if you could take a look. Of course, I would add you as a co-author (we could add @datta0 as well).

@github-actions
Copy link

github-actions bot commented May 9, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@githubnemo githubnemo added the wip label May 12, 2025
BenjaminBossan added a commit that referenced this pull request Jun 16, 2025
Continuation of, and supersedes, #2389

Check discussion there for further info.

---------

Co-authored-by: Wing Lian <[email protected]>
@BenjaminBossan
Copy link
Member

@winglian I merged #2498, which supersedes this PR, so I'm closing it now. I added you as co-author.

efraimdahl pushed a commit to efraimdahl/peft that referenced this pull request Jul 12, 2025
Continuation of, and supersedes, huggingface#2389

Check discussion there for further info.

---------

Co-authored-by: Wing Lian <[email protected]>
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
* Suppress warning for estimating tokens in trainer

* Suppress warning for estimating FLOPs in ORPO and Reward trainers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants