Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

yuyujunjun · 2024-11-01T16:27:39Z

Thank you for your brilliant work. It has been incredibly helpful!

After executing merge_lora_param, I noticed that the out_proj.weight and in_proj_weight parameters are missing from the MultiheadAttention class. This issue prevents me from saving and loading the model properly afterward.

Is there any recommended workaround or fix for this?

Thanks for your help!

yuyujunjun · 2024-11-01T17:19:49Z

I found that after using delattrib and setattrib, attributes with parameter types are converted to tensor types, which causes the state_dict() function to exclude these attributes from the model’s state. To address this, I modified the code by adding p_new = nn.Parameter(p_new) as shown below:

    def merge_lora_param(self):
        r"""p_new = p + scaling * B @ A and keep differentiable to A and B"""
        for param_name, lora_name in self.params_with_lora.items():
            p = set_param(self, param_name, mode='get')
            # detach() is very important here
            p_new = p.detach() + self.merge_BA(param_name) * self.scaling
            p_new = nn.Parameter(p_new)
            set_param(self, param_name, param=p_new, mode='update')

Let me know if you notice any issues with this approach.

Baijiong-Lin · 2024-11-02T01:12:19Z

I guess lora parameters will not be updated if you add p_new = nn.Parameter(p_new).

Actually, the model weights are not updated during the lora training process. So only the lora weights need to be saved.

yuyujunjun · 2024-11-02T14:20:10Z

Thank you for your quick response. I’ve confirmed that this approach indeed doesn’t correctly update the parameters. While I understand there’s no need to update in_proj_weight and out_proj.weight during LoRA training, their absence in the state_dict creates challenges for saving and loading the model, which also raises some concerns.

Is there an elegant way to address this issue?

Baijiong-Lin · 2024-11-02T14:24:08Z

Since the model is not updated, why do you need to save it? It is already saved in your local machine before the lora training.

yuyujunjun · 2024-11-03T09:15:20Z

While separating and saving LoRA parameters independently could be an option, my network contains additional parameters, making it less elegant to separate everything.
Loading with strict = False could be another option. However, for resuming training, loading the model strictly is a safer approach compared to using strict=False.
I believe it’s standard practice for a model to retain the same state during saving as it has after preparation, which helps prevent misunderstandings, as seen in huggingface/peft#761 (comment) . Overall, if there’s no way to fully meet this requirement, I believe the best approach is still to separate them. Thank you for your efforts.

Baijiong-Lin · 2024-11-03T09:17:56Z

I have fixed this problem. You only need to add loratorch.register_model_param_after_backward(model) after every backward process. You can refer to Step 3 at https://github.com/Baijiong-Lin/LoRA-Torch?tab=readme-ov-file#quick-start for details.

yuyujunjun · 2024-11-03T09:20:33Z

Thank you for the update! Appreciate the guidance!

Baijiong-Lin added a commit that referenced this issue Nov 3, 2024

fix a bug (#7)

e3e20a0

yuyujunjun closed this as completed Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

yuyujunjun commented Nov 1, 2024

yuyujunjun commented Nov 1, 2024

Baijiong-Lin commented Nov 2, 2024

yuyujunjun commented Nov 2, 2024

Baijiong-Lin commented Nov 2, 2024

yuyujunjun commented Nov 3, 2024

Baijiong-Lin commented Nov 3, 2024

yuyujunjun commented Nov 3, 2024

Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

Missing out_proj.weight and in_proj_weight in MultiheadAttention after merge_lora_param #7

Comments

yuyujunjun commented Nov 1, 2024

yuyujunjun commented Nov 1, 2024

Baijiong-Lin commented Nov 2, 2024

yuyujunjun commented Nov 2, 2024

Baijiong-Lin commented Nov 2, 2024

yuyujunjun commented Nov 3, 2024

Baijiong-Lin commented Nov 3, 2024

yuyujunjun commented Nov 3, 2024