Skip to content

Conversation

leo-yangli
Copy link
Contributor

Hi,

I am one of the authors of VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks. You can find our paper here.

This PR integrates our method, VB-LoRA. The implementation and tests follow VeRA and LoRA. I have run pytest locally, and all the tests have passed.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR that adds VB-LoRA to PEFT. Your implementation already looks quite advanced, is following the existing examples, and includes the required testing, nice 🎉

In my first review, I focused on the actual VB-LoRA implementation, so I haven't checked the examples and docs yet. As you can see, I added a bunch of comments but those should not be big issues. Please check if they make sense.

Regarding the paper, I also found two typos, in case you want to fix them: "virtrual" and "backpropgatation".

)
},
)
save_topk_weights: bool = field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_only_topk_weights would be a more fitting name, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will modify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 97 to 157
init_vector_bank_bound: float = field(
default=0.02,
metadata={
"help": (
"The vector bank is initialized with a uniform distribution between -init_vector_bank_bound and"
" init_vector_bank_bound."
),
},
)
init_logits_std: float = field(
default=0.1,
metadata={
"help": (
"The logits are initialized with a normal distribution with a standard deviation of init_logits_std."
),
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have any tips for how users should initialize the VB-LoRA specific parameters (num_vectors, vector_length, topk, init_vector_bank_bound, init_logits_std), that would be great, as they can't use their existing intuitions from LoRA. Even a reference to the paper could be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will add some comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

self.target_modules = (
set(self.target_modules) if isinstance(self.target_modules, list) else self.target_modules
)
if self.save_topk_weights:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this warning is not necessary. If users changed this setting, they have certainly read the description. We want to avoid unnecessary warnings as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

will not produce the same output as the base model would have without adaptation.
modules_to_save (`List[str]`):
List of modules apart from Vera layers to be set as trainable and saved in the final checkpoint.
init_weights (`bool`):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, the init_weights argument is not used. I think it would be great if it was supported the same way as in LoRA: If set to True (default), VB-LoRA is initialized such that it performs and identity transform, and vice versa. Not sure if this fits well with the paper, we can also agree on a different default, but having that option would be appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our approach, we shouldn't initialize the additive adaptor to zero because if the vector bank is initialized to zero, both matrices A and B will be zero, resulting in a zero gradient. As a result, we can't align with other methods regarding init_weights=True. Do you have any suggestions in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed your first suggestion in the comment:

  1. I removed init_weights.
  2. In tests, I set the vector bank to zero to make VB-LoRA as an identity operation. And initialized the vector bank if training is involved in the test.

f"Target module {target} is not supported. Currently, only the following modules are supported: "
"`torch.nn.Linear`, `transformers.pytorch_utils.Conv1D`."
)
r = kwargs.pop("r")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delete this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your replies.

Regarding the discussion about init_weights, as it came up multiple times, I'll reply here:

From my understanding, the problem is that A and B matrices share the same vector bank. Therefore, if the vector bank is initialized to zeros, both A and B are zeros and therefore there is no gradient. This is unlike LoRA, where only B is zeros.
Also, it's not possible to set the logits for B to zero, as a softmax operation is performed, thus the weights will always sum up to 1.

I have three suggestions:

  1. For those cases where we want VB-LoRA to be a no-op, we just manually call nn.init.ones_(model.base_model.vblora_vector_bank["default"]). The init_weights argument should be removed.
  2. We add the option to pass init_weights=False (with True being default) and in that case initialize the vector bank to zeros. This model is untrainable, so we give a warning that this is option only exists for testing purposes.
  3. Add a new parameter that scales the contribution from B, which is a learnable parameter that starts at 0. This way, the vector bank is randomly initialized but VB-LoRA still starts as a no-op. Of course, this would be a deviation from the paper.

I also have a general comment about the docs: Let's make sure to mention that there is an option for VB-LoRA to reduce the size of the file when saving, but that the loaded adapter can only be used for inference after that. Ideally, you could give a numerical example (e.g. for Llama3 8b with parameter ..., the size is reduced from X to Y).

@@ -0,0 +1,263 @@
# Copyright 2023-present the HuggingFace Inc. team.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the dates in all new files to 2024.

@leo-yangli
Copy link
Contributor Author

Thank you for your valuable feedback! I have updated the code accordingly.

I also updated the documentation in docs/source/package_reference/vblora.md. Please let me know if there is anything else I can improve.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

Thanks for the updates @leo-yangli. I haven't reviewed yet but saw that the CI fails because of the following test:

test_parameters_after_loading_model_110_Vanilla_MLP_4_VBLoRA

Could you please check?

@leo-yangli
Copy link
Contributor Author

Thank you for your feedback. I just fixed that failed test.

@leo-yangli
Copy link
Contributor Author

It's ready to be reviewed.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates. This is almost good to be merged, I only saw a few tiny issues left. Please take a look.

"id": "ddfc0610-55f6-4343-a950-125ccf0f45ac",
"metadata": {},
"source": [
"In this example, we fine-tune Roberta on a sequence classification task using VB-LoRA."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a sentence that this notebook is based on examples/sequence_classficiation/VeRA.ipynb and users can compare the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

" target_modules=['key', 'value', 'query', 'output.dense', 'intermediate.dense'],\n",
" num_vectors=num_vectors,\n",
" vector_length=vector_length,\n",
" save_only_topk_weights=True,\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment here that to explain this argument. Using this also means that we cannot resume training from checkpoints if training is interrupted for some reason, right? Let's also mention that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

with pytest.raises(ValueError, match=msg):
get_peft_model(model, config1)

config2 = VBLoRAConfig(target_modules=["lin1"], vector_length=vector_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split this into 2 tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

The length of the vectors in the vector bank. The length of the vectors should be divisible by the hidden
dimension of the model.
topk (`int`):
K value for topk selection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could explain a bit more: What is the default, when should users decrease or increase this number? We can make the assumption that users are familiar with LoRA and would like to try VB-LoRA, they should not have to read the paper to understand the hyper-parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanations.

the target modules manually.
save_only_topk_weights (`bool`):
Whether to only save the topk weights. Models saved in this mode can be used for merging or inference only,
not for resuming training.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention here that the point is to reduce the file size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

# Check for infinity values when training. If found, training was likely resumed from a `save_only_topk_weights` model.
if self.training and vblora_logits_A[0, 0].isinf().any():
raise RuntimeError(
"Found infinity values in logits. Ensure training was not resumed from a `save_only_topk_weights` model."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add: "VB-LoRA logits", otherwise this could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@BenjaminBossan
Copy link
Member

Just something that came to my mind out of curiosity: Did you try experiments with multiple VB-LoRA adapters (for different tasks) that share the same vector bank? So e.g. train the logits and vector bank on task 1, then freeze the vector bank. Next, create a new VB-LoRA for task 2 that uses the frozen vector bank from task 1 and only trains logits? Probably this won't work well enough for task 2 but it would be really cool if it did to save even more parameters.

@leo-yangli
Copy link
Contributor Author

Hi @BenjaminBossan ,

That's a great question! Training in an MTL setting is currently an area of ongoing research for us, and we have tried the exact same idea. Preliminary results indicate that when the two tasks are similar (for example, RTE and MRPC from the GLUE benchmark), the second task can still achieve decent performance with a frozen vector bank, although there is a noticeable drop in performance (e.g., 2%). This performance gap tends to decrease when we alternate training between tasks. However, a more robust training strategy for an MTL setting still needs to be explored. I hope this answered your question.

I updated the code based on your review. Moreover, I added a new feature (count VB-LoRA savable parameters) with tests to the code, and it is ready for review.

Thanks!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest updates. Also, good idea to add the methods to count the number of savable parameters.

I found a small issue not being fully addressed yet in a few tests, the rest looks good.

Btw. if you just push on top of the PR without rebase, it makes it much easier to review for me. There is no need for a clean git history, as I'll squash before merging.

Regarding the sharing of the vector bank: Thanks for the info. If the new research is finished and you found a nice way to share the vector bank, it would be nice to add this feature here. I think it shouldn't require many adjustments to implement.


assert os.path.exists(save_path / "adapter_config.json")

mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not quite resolved. The mlp you used here is the same that you used above when calling mlp_vblora = get_peft_model(mlp, config). This means this is the mutated mlp, as get_peft_model will perform in-place operations on it. What I mean is that before calling PeftModel.from_pretrained(mlp, save_path), you should create a fresh instance of mlp. I typically also delete the old instance if it's no longer needed, just to make it clear.

Suggested change
mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)
del mlp
mlp = self.get_mlp()
mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)

The same should be done for the tests below.

@leo-yangli
Copy link
Contributor Author

Thanks for pointing out the mistake. I have fixed it.

I've also added a new test, test_save_with_topk_weights, to test the weights saved in save_only_topk_weights=True mode are as expected.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this excellent addition to PEFT. Great PR, well implemented and tested right from the start.

I checked out the code and ran all tests on GPU, and they pass without trouble. Line coverage is also good, so this can be merged!

@BenjaminBossan BenjaminBossan merged commit c9f7240 into huggingface:main Sep 4, 2024
14 checks passed
@leo-yangli
Copy link
Contributor Author

Thank you @BenjaminBossan! The code quality has improved a lot thanks to your insightful review.

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
Implements "VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector
Banks"

https://arxiv.org/abs/2405.15179
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
…oss_enabled` is set to True. (huggingface#2039)

* fix: prevent unpackaging error due to additional **aux_loss** returned by **concatenated_forward** function when **aux_loss_enabled** is set to True.

* Refactor: Simplify tuple unpacking in `concatenated_forward` call in `get_batch_loss_metrics` function

* Refactor: improve code quality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants