Skip to content

FEAT Add sine-LoRA #2434 #2457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

yipingji
Copy link

Hi,

I just implemented sine LoRA in variants.py and changed "resolve_lora_variant" a bit in layers.py.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for providing an early draft. This is already on the right track, although there are still some bits missing.

I would suggest to focus on making one thing work first (e.g., integration of SineLoraLinearVariant) so that you can write tests for it and then extending it to other layer types (e.g., Embedding).

For testing I'd suggest adding a test case to test_custom_models.py (copying one of the LoRA tests cases in TEST_CASES with use_sine_lora: True is sufficient for now) so that you have a broad number of test cases for you to execute. Say you added this line

("Vanilla MLP LoRA + SineLoRA", "MLP", LoraConfig, {"target_modules": ["lin0", "lin1"]}),

you can then invoke the tests using

pytest -k SineLoRA tests

Feel free to ping if you have any questions :)

Comment on lines 793 to 794
from .variants import SineLoraLinearVariant
return SineLoraLinearVariant()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should reference the (not yet existing) SineLoraEmbeddingVariant since we're in the Embedding class.

But this code is good for Linear.resolve_lora_variant :) You can use it there!

Effectively every class that overrides modules in the model (Linear, Embedding, Conv2d, ...) needs its own variant implementation and resolve_lora_variant implementation but we can keep it at Linear and Embedding for now if you want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I corrected. Please check my new PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can you check again? I don't see changes in this regard :/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I git push again. I only implemented Linear and Embedding for now and is that ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's totally fine. We can add support for convolutions later once Linear and Embedding work as expected.


class SineLoraLinearVariant(LoraVariant):
@staticmethod
def init(module: Linear, adapter_name:str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def init(module: Linear, adapter_name:str) -> None:
def init(module: Linear, adapter_name:str, **kwargs) -> None:

With PR #2455 now merged, init() receives all the parameters that update_layer receives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm I did not use that and do you think that is ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no.

Currently the tests do not work because of the changes necessary in Linear.__init__ and Embedding.__init__. Once the changes are in place you'll see that calls to init will complain about unexpected arguments passed to init(). That's because all the config args are passed to init and without the wildcard **kwargs you have to define them all (which we don't want, of course).

Also you need a place to set module.sinelora_scaling and module.sinelora_frequency. This is here, from the kwargs, e.g.

module.sinelora_frequency = kwargs['sinelora_frequency']

For sinelora_scaling you need to check if kwargs['sinelora_scaling'] is None.

Yiping Ji added 3 commits March 28, 2025 14:29
1
Merge remote-tracking branch 'upstream/main' into sine_lora
Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress! Maybe I'm missing some changes but I didn't see an update regarding the Embedding vs Linear layer variants. Could it be you missed a git push?

=3 Outdated
Comment on lines 1 to 5
Collecting oauthlib
Using cached oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)
Installing collected packages: oauthlib
Successfully installed oauthlib-3.2.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove :)

Comment on lines 793 to 794
from .variants import SineLoraLinearVariant
return SineLoraLinearVariant()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can you check again? I don't see changes in this regard :/

@BenjaminBossan BenjaminBossan changed the title draft PR FEAT Add sine-LoRA #2434 Mar 31, 2025
Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, for Linear and Embedding this looks like it is almost ready.

Please remove the following files that were probably wrongly checked in:

  • =3
  • posters/.$Untitled Diagram.drawio.dtmp
  • posters/Untitled Diagram.drawio
  • posters/Untitled Diagram.drawio.pdf

Please run make style to fix any style and code issues.

I think it would be worthwhile to add a test that compares plain LoRA with SineLoRA and makes sure that the output is different (e.g., for extreme scaling values so that we don't have to train for long). You could create tests/test_lora_variant_sinelora.py and add such a test there. I think it would be quite useful for making sure that the code works as intended :)

To test the Embedding layer implementation you could add

("Embedding + transformers Conv1D 2 LoRA + SineLoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_sinelora": True}),

to the custom models testcase you already added.

Comment on lines 793 to 794
from .variants import SineLoraLinearVariant
return SineLoraLinearVariant()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's totally fine. We can add support for convolutions later once Linear and Embedding work as expected.

Comment on lines +306 to +307
sinelora_scaling (`float`):
The scaling factor for the sine activation. If not specified, it will be set to the default value of sqrt(in_features).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this value is optional, it should be marked as type Optional[float]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change type of sinelora_scaling here to Optional[float] as it is defined in code.


class SineLoraLinearVariant(LoraVariant):
@staticmethod
def init(module: Linear, adapter_name:str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no.

Currently the tests do not work because of the changes necessary in Linear.__init__ and Embedding.__init__. Once the changes are in place you'll see that calls to init will complain about unexpected arguments passed to init(). That's because all the config args are passed to init and without the wildcard **kwargs you have to define them all (which we don't want, of course).

Also you need a place to set module.sinelora_scaling and module.sinelora_frequency. This is here, from the kwargs, e.g.

module.sinelora_frequency = kwargs['sinelora_frequency']

For sinelora_scaling you need to check if kwargs['sinelora_scaling'] is None.

@yipingji
Copy link
Author

@githubnemo Hi, I am still a bit confused how to add module.sinelora_frequency = kwargs['sinelora_frequency'] and sinelora_scaling. It returns key error when I put module.sinelora_frequency = kwargs['sinelora_frequency'] in class SineLoraLinearVariant(LoraVariant):

@githubnemo
Copy link
Collaborator

Hi @yipingji

Sorry for being unclear. I tried highlighting the issue in #2457 (comment). The kwargs come from the update_layer calls, the update_layer calls get their parameters from the layer __init__ calls. As long as they don't have sinelora_frequency and sinelora_scaling they will be missing from the kwargs in SineLora*Variant.init as well.

So you have to update the following functions with new parameters the same way you did to add the use_sinelora parameter:

  • LoraLayer.update_layer
  • Linear.__init__
  • Linear.update_layer
  • Embedding.__init__
  • Embedding.update_layer

For example, the change for LoraLayer and Linear could look like this:

diff --git a/src/peft/tuners/lora/layer.py b/src/peft/tuners/lora/layer.py
index cb09608..efd8d6a 100644
--- a/src/peft/tuners/lora/layer.py
+++ b/src/peft/tuners/lora/layer.py
@@ -183,6 +183,8 @@ class LoraLayer(BaseTunerLayer):
         use_rslora,
         use_dora: bool = False,
         use_sinelora: bool = False,
+        sinelora_frequency = 200.0,
+        sinelora_scaling = None,
         lora_bias: bool = False,
     ):
         # collect the kwargs
@@ -574,6 +576,8 @@ class Linear(nn.Module, LoraLayer):
         use_dora: bool = False,
         lora_bias: bool = False,
         use_sinelora: bool = False,
+        sinelora_frequency: float = 200.0,
+        sinelora_scaling: Optional[float] = None,
         **kwargs,
     ) -> None:
         super().__init__()
@@ -591,6 +595,8 @@ class Linear(nn.Module, LoraLayer):
             use_dora=use_dora,
             lora_bias=lora_bias,
             use_sinelora=use_sinelora,
+            sinelora_frequency=sinelora_frequency,
+            sinelora_scaling=sinelora_scaling,
         )
         self.is_target_conv_1d_layer = is_target_conv_1d_layer

I hope that helps!

To test the Embedding layer implementation you could add

("Embedding + transformers Conv1D 2 LoRA + SineLoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_sinelora": True}),

to the custom models testcase you already added.

@yipingji
Copy link
Author

yipingji commented Apr 1, 2025

Hi @githubnemo, apologies for the previous update and it is my first time to PR in such a large codebase. I was just wondering if my current one is correct.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @githubnemo, apologies for the previous update and it is my first time to PR in such a large codebase. I was just wondering if my current one is correct.

No worries, you're doing fine :) Thanks for your work!

Please make sure to test your changes regularly by running pytest:

# from the root of the repository
pytest -k SineLoRA tests

I left some comments regarding the forward implementations.

I think we need to implement merge and unmerge as well but they should be basically equal to the way LoRA is doing it.

Comment on lines 331 to 336
sine_output = (
module._embed(x)
@ torch.sin(module.sinelora_frequency * lora_embedding_A.weight.T @ lora_embedding_B.weight.T)
/ module.sinelora_scaling
* lora_scaling
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, is this correct? The way I see it, _embed takes to parameters, x and weight but we're only supplying x here. weight should probably be lora_embedding_A.T?

Comment on lines 314 to 315
result = result + sine_output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing a return here.

@githubnemo
Copy link
Collaborator

@yipingji gentle ping :)

@yipingji
Copy link
Author

@yipingji gentle ping :)

Sorry for the late updates. I am working on conference recently and will PR soon;)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this May 19, 2025
@yipingji
Copy link
Author

Hi @githubnemo ,

How can i reopen the PR

@githubnemo githubnemo reopened this May 21, 2025
@githubnemo githubnemo added the wip label May 21, 2025
@githubnemo
Copy link
Collaborator

@yipingji I've reopened the PR and added the WIP tag so stale bot will not bother us.

@yipingji
Copy link
Author

@githubnemo I just updated and please have a look;)

@yipingji
Copy link
Author

gentle ping @githubnemo ;)

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I think we're getting close on finishing this.

Very important: make sure to run make style and pytest -k SineLoRA tests before pushing changes and asking for a review. Both the tests and the linter are failing right now. You can use these tools to iterate on your side before asking for a review, that way the review will be a lot faster :)

I suggested to add a test case for the sinelora_scaling parameter, if it helps I encourage you to add more such test cases to remove the remaining bugs even faster!

Comment on lines +306 to +307
sinelora_scaling (`float`):
The scaling factor for the sine activation. If not specified, it will be set to the default value of sqrt(in_features).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change type of sinelora_scaling here to Optional[float] as it is defined in code.

Comment on lines +1384 to +1387
from .variants import DoraConv1dVariant
elif use_sinelora:
from .variants import SineLoraConv1dVariant
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We're missing the return statements here so resolve_lora_variant always returns None.
  • There doesn't seem to exist a SineLoraConv1dVariant yet

If you have an implementation for conv*d I'd suggest adding it. If you don't maybe it is worthwhile to skip it for now and undo the changes in the Conv* layers.

init_lora_weights,
use_rslora,
use_dora,
use_sinelora,
Copy link
Collaborator

@githubnemo githubnemo Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to change the update_layer call in ConvNd.__init__ as well (currently misses all sinelora arguments). But since we're skipping convolutions for now I suggest to remove it entirely.

merged_weight = orig_weight + delta_weight
if not torch.isfinite(merged_weight).all():
raise ValueError(f"NaNs detected in merged weights for adapter {active_adapter}")
module._cache_store(f"{active_adapter}-delta_weight", delta_weight)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to cache the delta weights?

The same goes for merge_unsafe and the Embedding implementation.

def init(module: Embedding, adapter_name: str, **kwargs) -> None:
module.sinelora_frequency = kwargs["sinelora_frequency"]

sinelora_scaling = kwargs["sinelora_scaling"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be self.sinelora_scaling = kwargs....

Please add a test case for this scenario, e.g.:

    ("Embedding + transformers Conv1D LoRA + SineLoRA 2", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_sinelora": True, "sinelora_scaling": 100}),

Comment on lines +458 to +477
class DoraEmbeddingLayer(DoraLinearLayer):
def forward(self, x, *, lora_A, lora_B, scaling, base_layer, embed_fn):
"""
For DoRA, calculate the extra output from LoRA with DoRA applied. This should be added on top of the base layer
output.
"""
lora_weight = (lora_A @ lora_B).T
magnitude = self.weight
weight = base_layer.weight
weight_norm = self.get_weight_norm(weight, lora_weight.detach(), scaling)
# see section 4.3 of DoRA (https://arxiv.org/abs/2402.09353)
# "[...] we suggest treating ||V +∆V ||_c in
# Eq. (5) as a constant, thereby detaching it from the gradient
# graph. This means that while ||V + ∆V ||_c dynamically
# reflects the updates of ∆V , it won’t receive any gradient
# during backpropagation"
weight_norm = weight_norm.detach()
mag_norm_scale = magnitude / weight_norm
result_dora = mag_norm_scale * (embed_fn(x, lora_A) @ lora_B) * scaling
return mag_norm_scale, result_dora
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this come from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants