Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

Closed
yuyujunjun opened this issue Nov 1, 2024 · 7 comments

Comments

@yuyujunjun
Copy link

Thank you for your brilliant work. It has been incredibly helpful!

After executing merge_lora_param, I noticed that the out_proj.weight and in_proj_weight parameters are missing from the MultiheadAttention class. This issue prevents me from saving and loading the model properly afterward.

Is there any recommended workaround or fix for this?

Thanks for your help!

@yuyujunjun
Copy link
Author

I found that after using delattrib and setattrib, attributes with parameter types are converted to tensor types, which causes the state_dict() function to exclude these attributes from the model’s state. To address this, I modified the code by adding p_new = nn.Parameter(p_new) as shown below:

    def merge_lora_param(self):
        r"""p_new = p + scaling * B @ A and keep differentiable to A and B"""
        for param_name, lora_name in self.params_with_lora.items():
            p = set_param(self, param_name, mode='get')
            # detach() is very important here
            p_new = p.detach() + self.merge_BA(param_name) * self.scaling
            p_new = nn.Parameter(p_new)
            set_param(self, param_name, param=p_new, mode='update')

Let me know if you notice any issues with this approach.

@Baijiong-Lin
Copy link
Owner

I guess lora parameters will not be updated if you add p_new = nn.Parameter(p_new).

Actually, the model weights are not updated during the lora training process. So only the lora weights need to be saved.

@yuyujunjun
Copy link
Author

Thank you for your quick response. I’ve confirmed that this approach indeed doesn’t correctly update the parameters. While I understand there’s no need to update in_proj_weight and out_proj.weight during LoRA training, their absence in the state_dict creates challenges for saving and loading the model, which also raises some concerns.

Is there an elegant way to address this issue?

@Baijiong-Lin
Copy link
Owner

Since the model is not updated, why do you need to save it? It is already saved in your local machine before the lora training.

Baijiong-Lin added a commit that referenced this issue Nov 3, 2024
@yuyujunjun
Copy link
Author

While separating and saving LoRA parameters independently could be an option, my network contains additional parameters, making it less elegant to separate everything.
Loading with strict = False could be another option. However, for resuming training, loading the model strictly is a safer approach compared to using strict=False.
I believe it’s standard practice for a model to retain the same state during saving as it has after preparation, which helps prevent misunderstandings, as seen in huggingface/peft#761 (comment) . Overall, if there’s no way to fully meet this requirement, I believe the best approach is still to separate them. Thank you for your efforts.

@Baijiong-Lin
Copy link
Owner

I have fixed this problem. You only need to add loratorch.register_model_param_after_backward(model) after every backward process. You can refer to Step 3 at https://github.com/Baijiong-Lin/LoRA-Torch?tab=readme-ov-file#quick-start for details.

@yuyujunjun
Copy link
Author

Thank you for the update! Appreciate the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants