-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Integration of PVeRA #2952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integration of PVeRA #2952
Conversation
…ation, and the tests.
|
Update : ran |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, this already looks quite good!
Please check the copyright notices and make sure that it is up-to-date (it often says 2024 but it should be 2025).
I've done a quick review and left a few comments. General remarks:
- let's add PVeRA to
tests/test_custom_models.py(adding the important configurations toTEST_CASES, similar to VeRA) - if these pass we can extend the coverage by adding PVeRA to
tests/test_decoder_models.pyandtests/test_encoder_decoder_models.py
Once these are implemented I'll do a more thorough review. After that it'd be nice to have a runnable example and to integrate it into the method comparison to benchmark it against the other methods (and to check our expectations).
Heads up: I'll be rather off than on in the next days, so merging and review will most likely happen in the next year.
|
Hello @githubnemo, thank you for your review! I added two commits:
After all these changes, I re-ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the update!
Implementation and tests look quite mature. I think that if we provide bitsandbytes support we also need at least one test in tests/test_gpu_examples.py for quality control. I think test_causal_lm_training_4bit_vera could be used as a base.
In general I think we should rename PVeRA* to Pvera* (e.g., PVeRAModel -> PveraModel) to be consistent with VeraModel and friends. It is quite hard to remember the spelling of the various abbreviations already :)
Rest of the review is in the comments.
After the comments are resolved and the CI is green I think it would be nice to integrate PVeRA into the MetaMathQA benchmark by adding an experiment file based on the VeRA experiment.
Edit: To fix the docs build, add the PVeRA entry to docs/source/_toctree.yml similar to the other entries.
| super(nn.Linear, self).__init__() | ||
| PVeRALayer.__init__(self, base_layer, **kwargs) | ||
| self.fan_in_fan_out = fan_in_fan_out | ||
| self.sample_at_inference = sample_at_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency I think this should be a dict so that I can toggle this behavior for each adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, are you talking about sample_at_inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.
|
Hello @githubnemo, thanks again for your review. I have updated the code with most of your comments, but the following issues still remain:
|
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
I've commented on hopefully all the outstanding issues and added a few nits.
CI seems to be passing except for the .eval issue which is hopefully resolved with the proposed fix.
| super(nn.Linear, self).__init__() | ||
| PVeRALayer.__init__(self, base_layer, **kwargs) | ||
| self.fan_in_fan_out = fan_in_fan_out | ||
| self.sample_at_inference = sample_at_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.
tests/test_pvera.py
Outdated
| def test_multiple_adapters_save_load_save_projection_true(self, mlp_same_prng, tmp_path): | ||
| # check saving and loading works with multiple adapters and saved projection weights | ||
| torch.manual_seed(0) | ||
| input = torch.randn(5, 10) | ||
| mlp_same_prng.set_adapter("default") | ||
| mlp_same_prng.eval() | ||
| output_default = mlp_same_prng(input) | ||
| mlp_same_prng.set_adapter("other") | ||
| output_other = mlp_same_prng(input) | ||
|
|
||
| # sanity check | ||
| assert not torch.allclose(output_default, output_other, atol=1e-3, rtol=1e-3) | ||
|
|
||
| save_path = tmp_path / "pvera" | ||
| mlp_same_prng.save_pretrained(save_path) | ||
| assert os.path.exists(save_path / "adapter_config.json") | ||
| assert os.path.exists(save_path / "other" / "adapter_config.json") | ||
|
|
||
| torch.manual_seed(0) | ||
| mlp = MLP() | ||
| peft_model = PeftModel.from_pretrained(mlp, save_path) | ||
| peft_model.load_adapter(save_path / "other", "other") | ||
| peft_model.eval() | ||
|
|
||
| peft_model.set_adapter("default") | ||
| output_default_loaded = peft_model(input) | ||
| peft_model.set_adapter("other") | ||
| output_other_loaded = peft_model(input) | ||
|
|
||
| assert torch.allclose(output_default, output_default_loaded, atol=1e-3, rtol=1e-3) | ||
| assert torch.allclose(output_other, output_other_loaded, atol=1e-3, rtol=1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_multiple_adapters_save_load_save_projection_true probably already covered by the common tests or am I missing something special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I integrated these (test_multiple_adapters_save_load_save_projection_true and test_multiple_adapters_save_projection_true_contains_pvera_A_pvera_B) because they were integrated for VeRA, but I agree that they do seem a bit redundant. I'm ok with removing them if you think they aren't useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's remove them.
src/peft/tuners/pvera/config.py
Outdated
| }, | ||
| ) | ||
| pvera_dropout: float = field(default=0.0, metadata={"help": "PVeRA dropout"}) | ||
| d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."}) | |
| d_initial: float = field(default=0.1, metadata={"help": "Initial value for d vector."}) |
But we can just use the docstring for this parameter from above verbatim
src/peft/tuners/pvera/config.py
Outdated
| "List of module names or regex expression of the module names to replace with PVeRA." | ||
| "For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. " | ||
| "Only linear layers are supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the docstring values from above for all help values. In the end this makes it more maintainable and we're losing not much.
|
Hello @githubnemo, |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Correct me if I'm wrong but I think there's currently no test that checks whether the output is non-deterministic during training and during inference when requested. Let's make sure that this code path is tested, at least roughly.
I checked the old comments and resolved those that are done.
There are still some open comments:
Once that is done, let's add an experiment for the MetaMathQA method comparison. You can look at method_comparison/MetaMathQA/experiments/vera/llama-3.2-3B-default/ for inspiration. You may need to tune some hyper-parameters since they are probably non-optimal. Note that we try to keep the number of parameters roughly equal between methods. There's also documentation there how to run these experiments yourself for quick iteration.
It'd also good to have an example, maybe showcasing how to leverage sampling for getting confidence intervals? I let you be the judge of what is most effective in show-casing PVeRA.
…_multiple_adapters_save_projection_true_contains_*_A_*_B tests.
This PR is a continuation of issue #2948, for which we proposed the integration of the PVeRA adapter.
As recommended in the issue, we based our implementation on the implementation of the VeRA adapter, as both adapters are very close. Here are a list of the contributions from this PR.
config.py,layer.py, andmodel.pyfiles from the ones from VeRA.tests/test_vera.pytotests/test_pvera.pyand made sure this ran properly.Note: because I'm running on a Mac, I was not able to run
make test(had an error with MPS).@BenjaminBossan could you please give me some feedback?