-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[FEAT] Add FeRA (Frequency-Energy Constrained Routing Adaptation) method #2951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
For more context: #2930 |
|
Thanks for the PR @YinBo0927. Please make sure to only check in files that really belong to the PR. |
Thanks for mention! I have deleted the logs file. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR to add FeRA to PEFT. I did a first review of the code, not super in-depth yet, but let's get the basics working first.
Apart from these comments, could you please:
- Explain how the frequency energy consistency loss comes into play? From my understanding, this is an important part but I don't see it in the code right now.
- For each file, ensure that the PEFT copyright note is added to the start of the file using the current year.
| # FeRA: Frequency–Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning | ||
| [](https://arxiv.org/abs/2511.17979) | ||
|
|
||
| ## A light Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a sentence or two explaining what is happening here.
| @@ -0,0 +1,35 @@ | |||
| # FeRA: Frequency–Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning | |||
| [](https://arxiv.org/abs/2511.17979) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the svg is needed here.
| @@ -0,0 +1,24 @@ | |||
| import torch | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extend the example to include proper training of the model.
| default=4, | ||
| metadata={"help": "The rank of the LoRA experts."}, | ||
| ) | ||
| lora_alpha: float = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to just alpha.
| This is the configuration class to store the configuration of a [`FeRAModel`]. | ||
| """ | ||
|
|
||
| rank: int = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets rename this to r`.
| f"Target module {target} is not supported. Currently, only `torch.nn.Linear` is supported." | ||
| ) | ||
|
|
||
| def prepare_forward(self, z_t: torch.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me when I need to call this. Once before inference? Also before training? Let's properly document this method.
| default=0.7, | ||
| metadata={"help": "Temperature for the Softmax in the router."}, | ||
| ) | ||
| fecl_weight: float = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this argument being used anyway.
|
|
||
| routing_weights = self.router(e_t) # (B, num_experts) | ||
|
|
||
| count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count is not being used.
| self.num_experts = {} | ||
| self.kwargs = kwargs | ||
|
|
||
| self.current_routing_weights = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, current_routing_weights is directly set on the module. I wonder if it should be added as a (non-persistent) buffer instead. Otherwise, calls like module.to(0) won't affect current_routing_weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a better way to test these things in PEFT. For this, add FeRA to the test matrix in our existing tests. Start here:
peft/tests/test_custom_models.py
Line 75 in c5a905d
| TEST_CASES = [ |
Add some FeRA settings to the end of this list. Check the other PEFT methods in that list to see how to achieve that. This test file can be removed then, as those existing tests already cover what's needed.
Once you've done this, you can run the tests with pytest tests/test_custom_models.py -k fera. You will probably see a lot of failures which are caused by FeRA not supporting merging. For those tests, we will have to add skips. But we can discuss this later once the tests are there.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
ping @YinBo0927 |
We are integrating FeRA (Frequency-Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning), proposed in this paper(https://www.arxiv.org/abs/2511.17979). This method uses a frequency-energy indicator and a soft router to adaptively blend multiple LoRA experts based on the latent's frequency components for effective diffusion model adaptation.