-
Notifications
You must be signed in to change notification settings - Fork 1.9k
FEAT Add sine-LoRA #2434 #2457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FEAT Add sine-LoRA #2434 #2457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for providing an early draft. This is already on the right track, although there are still some bits missing.
I would suggest to focus on making one thing work first (e.g., integration of SineLoraLinearVariant
) so that you can write tests for it and then extending it to other layer types (e.g., Embedding
).
For testing I'd suggest adding a test case to test_custom_models.py
(copying one of the LoRA tests cases in TEST_CASES
with use_sine_lora: True
is sufficient for now) so that you have a broad number of test cases for you to execute. Say you added this line
("Vanilla MLP LoRA + SineLoRA", "MLP", LoraConfig, {"target_modules": ["lin0", "lin1"]}),
you can then invoke the tests using
pytest -k SineLoRA tests
Feel free to ping if you have any questions :)
src/peft/tuners/lora/layer.py
Outdated
from .variants import SineLoraLinearVariant | ||
return SineLoraLinearVariant() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should reference the (not yet existing) SineLoraEmbeddingVariant
since we're in the Embedding
class.
But this code is good for Linear.resolve_lora_variant
:) You can use it there!
Effectively every class that overrides modules in the model (Linear
, Embedding
, Conv2d
, ...) needs its own variant implementation and resolve_lora_variant
implementation but we can keep it at Linear
and Embedding
for now if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I corrected. Please check my new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you check again? I don't see changes in this regard :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I git push again. I only implemented Linear and Embedding for now and is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's totally fine. We can add support for convolutions later once Linear
and Embedding
work as expected.
src/peft/tuners/lora/variants.py
Outdated
|
||
class SineLoraLinearVariant(LoraVariant): | ||
@staticmethod | ||
def init(module: Linear, adapter_name:str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def init(module: Linear, adapter_name:str) -> None: | |
def init(module: Linear, adapter_name:str, **kwargs) -> None: |
With PR #2455 now merged, init()
receives all the parameters that update_layer
receives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm I did not use that and do you think that is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no.
Currently the tests do not work because of the changes necessary in Linear.__init__
and Embedding.__init__
. Once the changes are in place you'll see that calls to init
will complain about unexpected arguments passed to init()
. That's because all the config args are passed to init
and without the wildcard **kwargs
you have to define them all (which we don't want, of course).
Also you need a place to set module.sinelora_scaling
and module.sinelora_frequency
. This is here, from the kwargs, e.g.
module.sinelora_frequency = kwargs['sinelora_frequency']
For sinelora_scaling
you need to check if kwargs['sinelora_scaling']
is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! Maybe I'm missing some changes but I didn't see an update regarding the Embedding
vs Linear
layer variants. Could it be you missed a git push
?
=3
Outdated
Collecting oauthlib | ||
Using cached oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB) | ||
Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB) | ||
Installing collected packages: oauthlib | ||
Successfully installed oauthlib-3.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove :)
src/peft/tuners/lora/layer.py
Outdated
from .variants import SineLoraLinearVariant | ||
return SineLoraLinearVariant() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you check again? I don't see changes in this regard :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, for Linear
and Embedding
this looks like it is almost ready.
Please remove the following files that were probably wrongly checked in:
=3
posters/.$Untitled Diagram.drawio.dtmp
posters/Untitled Diagram.drawio
posters/Untitled Diagram.drawio.pdf
Please run make style
to fix any style and code issues.
I think it would be worthwhile to add a test that compares plain LoRA with SineLoRA and makes sure that the output is different (e.g., for extreme scaling values so that we don't have to train for long). You could create tests/test_lora_variant_sinelora.py
and add such a test there. I think it would be quite useful for making sure that the code works as intended :)
To test the Embedding
layer implementation you could add
("Embedding + transformers Conv1D 2 LoRA + SineLoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_sinelora": True}),
to the custom models testcase you already added.
src/peft/tuners/lora/layer.py
Outdated
from .variants import SineLoraLinearVariant | ||
return SineLoraLinearVariant() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's totally fine. We can add support for convolutions later once Linear
and Embedding
work as expected.
sinelora_scaling (`float`): | ||
The scaling factor for the sine activation. If not specified, it will be set to the default value of sqrt(in_features). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this value is optional, it should be marked as type Optional[float]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change type of sinelora_scaling
here to Optional[float]
as it is defined in code.
src/peft/tuners/lora/variants.py
Outdated
|
||
class SineLoraLinearVariant(LoraVariant): | ||
@staticmethod | ||
def init(module: Linear, adapter_name:str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no.
Currently the tests do not work because of the changes necessary in Linear.__init__
and Embedding.__init__
. Once the changes are in place you'll see that calls to init
will complain about unexpected arguments passed to init()
. That's because all the config args are passed to init
and without the wildcard **kwargs
you have to define them all (which we don't want, of course).
Also you need a place to set module.sinelora_scaling
and module.sinelora_frequency
. This is here, from the kwargs, e.g.
module.sinelora_frequency = kwargs['sinelora_frequency']
For sinelora_scaling
you need to check if kwargs['sinelora_scaling']
is None
.
@githubnemo Hi, I am still a bit confused how to add module.sinelora_frequency = kwargs['sinelora_frequency'] and sinelora_scaling. It returns key error when I put module.sinelora_frequency = kwargs['sinelora_frequency'] in class SineLoraLinearVariant(LoraVariant): |
Hi @yipingji Sorry for being unclear. I tried highlighting the issue in #2457 (comment). The So you have to update the following functions with new parameters the same way you did to add the
For example, the change for diff --git a/src/peft/tuners/lora/layer.py b/src/peft/tuners/lora/layer.py
index cb09608..efd8d6a 100644
--- a/src/peft/tuners/lora/layer.py
+++ b/src/peft/tuners/lora/layer.py
@@ -183,6 +183,8 @@ class LoraLayer(BaseTunerLayer):
use_rslora,
use_dora: bool = False,
use_sinelora: bool = False,
+ sinelora_frequency = 200.0,
+ sinelora_scaling = None,
lora_bias: bool = False,
):
# collect the kwargs
@@ -574,6 +576,8 @@ class Linear(nn.Module, LoraLayer):
use_dora: bool = False,
lora_bias: bool = False,
use_sinelora: bool = False,
+ sinelora_frequency: float = 200.0,
+ sinelora_scaling: Optional[float] = None,
**kwargs,
) -> None:
super().__init__()
@@ -591,6 +595,8 @@ class Linear(nn.Module, LoraLayer):
use_dora=use_dora,
lora_bias=lora_bias,
use_sinelora=use_sinelora,
+ sinelora_frequency=sinelora_frequency,
+ sinelora_scaling=sinelora_scaling,
)
self.is_target_conv_1d_layer = is_target_conv_1d_layer I hope that helps! To test the
to the custom models testcase you already added. |
Hi @githubnemo, apologies for the previous update and it is my first time to PR in such a large codebase. I was just wondering if my current one is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @githubnemo, apologies for the previous update and it is my first time to PR in such a large codebase. I was just wondering if my current one is correct.
No worries, you're doing fine :) Thanks for your work!
Please make sure to test your changes regularly by running pytest:
# from the root of the repository
pytest -k SineLoRA tests
I left some comments regarding the forward
implementations.
I think we need to implement merge
and unmerge
as well but they should be basically equal to the way LoRA
is doing it.
src/peft/tuners/lora/variants.py
Outdated
sine_output = ( | ||
module._embed(x) | ||
@ torch.sin(module.sinelora_frequency * lora_embedding_A.weight.T @ lora_embedding_B.weight.T) | ||
/ module.sinelora_scaling | ||
* lora_scaling | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, is this correct? The way I see it, _embed
takes to parameters, x
and weight
but we're only supplying x
here. weight
should probably be lora_embedding_A.T
?
src/peft/tuners/lora/variants.py
Outdated
result = result + sine_output | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing a return here.
@yipingji gentle ping :) |
Sorry for the late updates. I am working on conference recently and will PR soon;) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hi @githubnemo , How can i reopen the PR |
@yipingji I've reopened the PR and added the WIP tag so stale bot will not bother us. |
@githubnemo I just updated and please have a look;) |
gentle ping @githubnemo ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I think we're getting close on finishing this.
Very important: make sure to run make style
and pytest -k SineLoRA tests
before pushing changes and asking for a review. Both the tests and the linter are failing right now. You can use these tools to iterate on your side before asking for a review, that way the review will be a lot faster :)
I suggested to add a test case for the sinelora_scaling
parameter, if it helps I encourage you to add more such test cases to remove the remaining bugs even faster!
sinelora_scaling (`float`): | ||
The scaling factor for the sine activation. If not specified, it will be set to the default value of sqrt(in_features). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change type of sinelora_scaling
here to Optional[float]
as it is defined in code.
from .variants import DoraConv1dVariant | ||
elif use_sinelora: | ||
from .variants import SineLoraConv1dVariant | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We're missing the return statements here so
resolve_lora_variant
always returnsNone
. - There doesn't seem to exist a
SineLoraConv1dVariant
yet
If you have an implementation for conv*d I'd suggest adding it. If you don't maybe it is worthwhile to skip it for now and undo the changes in the Conv*
layers.
init_lora_weights, | ||
use_rslora, | ||
use_dora, | ||
use_sinelora, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to change the update_layer
call in ConvNd.__init__
as well (currently misses all sinelora arguments). But since we're skipping convolutions for now I suggest to remove it entirely.
merged_weight = orig_weight + delta_weight | ||
if not torch.isfinite(merged_weight).all(): | ||
raise ValueError(f"NaNs detected in merged weights for adapter {active_adapter}") | ||
module._cache_store(f"{active_adapter}-delta_weight", delta_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to cache the delta weights?
The same goes for merge_unsafe
and the Embedding
implementation.
def init(module: Embedding, adapter_name: str, **kwargs) -> None: | ||
module.sinelora_frequency = kwargs["sinelora_frequency"] | ||
|
||
sinelora_scaling = kwargs["sinelora_scaling"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be self.sinelora_scaling = kwargs...
.
Please add a test case for this scenario, e.g.:
("Embedding + transformers Conv1D LoRA + SineLoRA 2", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_sinelora": True, "sinelora_scaling": 100}),
class DoraEmbeddingLayer(DoraLinearLayer): | ||
def forward(self, x, *, lora_A, lora_B, scaling, base_layer, embed_fn): | ||
""" | ||
For DoRA, calculate the extra output from LoRA with DoRA applied. This should be added on top of the base layer | ||
output. | ||
""" | ||
lora_weight = (lora_A @ lora_B).T | ||
magnitude = self.weight | ||
weight = base_layer.weight | ||
weight_norm = self.get_weight_norm(weight, lora_weight.detach(), scaling) | ||
# see section 4.3 of DoRA (https://arxiv.org/abs/2402.09353) | ||
# "[...] we suggest treating ||V +∆V ||_c in | ||
# Eq. (5) as a constant, thereby detaching it from the gradient | ||
# graph. This means that while ||V + ∆V ||_c dynamically | ||
# reflects the updates of ∆V , it won’t receive any gradient | ||
# during backpropagation" | ||
weight_norm = weight_norm.detach() | ||
mag_norm_scale = magnitude / weight_norm | ||
result_dora = mag_norm_scale * (embed_fn(x, lora_A) @ lora_B) * scaling | ||
return mag_norm_scale, result_dora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this come from?
Hi,
I just implemented sine LoRA in variants.py and changed "resolve_lora_variant" a bit in layers.py.