-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7
Comments
I found that after using delattrib and setattrib, attributes with parameter types are converted to tensor types, which causes the state_dict() function to exclude these attributes from the model’s state. To address this, I modified the code by adding p_new = nn.Parameter(p_new) as shown below:
Let me know if you notice any issues with this approach. |
I guess lora parameters will not be updated if you add Actually, the model weights are not updated during the lora training process. So only the lora weights need to be saved. |
Thank you for your quick response. I’ve confirmed that this approach indeed doesn’t correctly update the parameters. While I understand there’s no need to update in_proj_weight and out_proj.weight during LoRA training, their absence in the state_dict creates challenges for saving and loading the model, which also raises some concerns. Is there an elegant way to address this issue? |
Since the model is not updated, why do you need to save it? It is already saved in your local machine before the lora training. |
While separating and saving LoRA parameters independently could be an option, my network contains additional parameters, making it less elegant to separate everything. |
I have fixed this problem. You only need to add |
Thank you for the update! Appreciate the guidance! |
Thank you for your brilliant work. It has been incredibly helpful!
After executing merge_lora_param, I noticed that the out_proj.weight and in_proj_weight parameters are missing from the MultiheadAttention class. This issue prevents me from saving and loading the model properly afterward.
Is there any recommended workaround or fix for this?
Thanks for your help!
The text was updated successfully, but these errors were encountered: