-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix error of PEFT with disable adapters and FSDP #3001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@githubnemo Tagging for review |
|
I tried this got this error |
|
I think that is a different not related to this one error. I had a similar error in my training which no longer occurs after this change |
The merged PR #2856 includes deepspeed/fsdp for multi GPU, maybe that helps already? |
|
@githubnemo I think that should have fixed the bugs, can you run the workflow/review when you get a chance? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hmm, I think error in check_code_quality workflow is not related to this pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
I've ran the test on a multi-gpu machine and got the following error:
Testing disable_adapters()...
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 256, in <module>
[rank1]: main(test_name=args.test, model_id=args.model_id, quant=args.quant)
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 237, in main
[rank1]: test_disable_adapters(model_id, quant)
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 115, in test_disable_adapters
[rank1]: model.disable_adapters()
[rank1]: File "/home/ubuntu/envs/peft-githubnemo/lib/python3.12/site-packages/transformers/integrations/peft.py", line 724, in disable_adapters
[rank1]: raise ValueError("No adapter loaded. Please load an adapter first.")
[rank1]: ValueError: No adapter loaded. Please load an adapter first.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 256, in <module>
[rank0]: main(test_name=args.test, model_id=args.model_id, quant=args.quant)
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 237, in main
[rank0]: test_disable_adapters(model_id, quant)
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 115, in test_disable_adapters
[rank0]: model.disable_adapters()
[rank0]: File "/home/ubuntu/envs/peft-githubnemo/lib/python3.12/site-packages/transformers/integrations/peft.py", line 724, in disable_adapters
[rank0]: raise ValueError("No adapter loaded. Please load an adapter first.")
[rank0]: ValueError: No adapter loaded. Please load an adapter first.
I also left some comments.
Hmm, I think error in check_code_quality workflow is not related to this pr
Yep, you can ignore that for now.
tests/training/test_fsdp_adapters.py
Outdated
| print_if_process_zero("Testing set_adapter(['adapter1', 'adapter2'])...") | ||
| model.set_adapter(["adapter1", "adapter2"]) | ||
| print_if_process_zero("set_adapter(['adapter1', 'adapter2']) succeeded!") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also test if the base weights and the adapter2 weights are untouched and only adapter1 weights have changed.
src/peft/tuners/tuners_utils.py
Outdated
| for layer_name in self.adapter_layer_names: | ||
| layer = getattr(self, layer_name) | ||
| layer.requires_grad_(False) | ||
| # Handle FSDP case where params may be non-leaf tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this to explain why they are non-leaf tensors?
| # Handle FSDP case where params may be non-leaf tensors | |
| # Handle FSDP case where params may be non-leaf tensors by being wrapped in DTensors |
| if (key in adapter_names) and (not inference_mode): | ||
| # Note: It is possible that not a single layer is called with requires_grad_(True) here. This may | ||
| # happen if a completely different adapter layer is being activated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the note comment from before can stay as it doesn't seem to be invalidated.
tests/training/test_fsdp_adapters.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to adapters.py to avoid being gobbled up by pytest and dropping the fsdp prefix since this technically is a test for whatever distributed training we use
tests/training/test_fsdp_adapters.py
Outdated
| # Test set_adapter - should not raise | ||
| print_if_process_zero("Testing set_adapter('adapter2')...") | ||
| model.set_adapter("adapter2") | ||
| print_if_process_zero("set_adapter('adapter2') succeeded!") | ||
|
|
||
| # Test switching back | ||
| print_if_process_zero("Testing set_adapter('adapter1')...") | ||
| model.set_adapter("adapter1") | ||
| print_if_process_zero("set_adapter('adapter1') succeeded!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the test should mimic a more realistic training scenario, i.e. switching to adapter 1, training a bit, switching to adapter 2, training that.
Thanks for running this. Will take a look comments in the upcoming days and fix it. And probably get a multi gpu machine to test this out properly |
|
Updated the tests and resolved comments and tested on multi gpu device. Should be ready now |
|
Hmm, I think test errors are not related to this PR 🤔 @githubnemo wdyt? |
Fixes #2977
For tests, I'm not sure if PEFT repo has multiple gpu nodes in CI for testing this. By searching through tests, I couldn't find distributed tests. Would be glad to add it if you point me to where it is/example test