Add VB-LoRA #2039

leo-yangli · 2024-08-26T18:52:15Z

Hi,

I am one of the authors of VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks. You can find our paper here.

This PR integrates our method, VB-LoRA. The implementation and tests follow VeRA and LoRA. I have run pytest locally, and all the tests have passed.

review-notebook-app · 2024-08-26T18:52:20Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenjaminBossan

Thanks a lot for this PR that adds VB-LoRA to PEFT. Your implementation already looks quite advanced, is following the existing examples, and includes the required testing, nice 🎉

In my first review, I focused on the actual VB-LoRA implementation, so I haven't checked the examples and docs yet. As you can see, I added a bunch of comments but those should not be big issues. Please check if they make sense.

Regarding the paper, I also found two typos, in case you want to fix them: "virtrual" and "backpropgatation".

src/peft/tuners/vblora/config.py

BenjaminBossan · 2024-08-27T12:28:44Z

src/peft/tuners/vblora/config.py

+            )
+        },
+    )
+    save_topk_weights: bool = field(


save_only_topk_weights would be a more fitting name, right?

Right. I will modify it.

BenjaminBossan · 2024-08-27T12:30:15Z

src/peft/tuners/vblora/config.py

+    init_vector_bank_bound: float = field(
+        default=0.02,
+        metadata={
+            "help": (
+                "The vector bank is initialized with a uniform distribution between -init_vector_bank_bound and"
+                " init_vector_bank_bound."
+            ),
+        },
+    )
+    init_logits_std: float = field(
+        default=0.1,
+        metadata={
+            "help": (
+                "The logits are initialized with a normal distribution with a standard deviation of init_logits_std."
+            ),
+        },
+    )


If you have any tips for how users should initialize the VB-LoRA specific parameters (num_vectors, vector_length, topk, init_vector_bank_bound, init_logits_std), that would be great, as they can't use their existing intuitions from LoRA. Even a reference to the paper could be useful here.

Agree. I will add some comments here.

BenjaminBossan · 2024-08-27T12:31:17Z

src/peft/tuners/vblora/config.py

+        self.target_modules = (
+            set(self.target_modules) if isinstance(self.target_modules, list) else self.target_modules
+        )
+        if self.save_topk_weights:


IMO, this warning is not necessary. If users changed this setting, they have certainly read the description. We want to avoid unnecessary warnings as much as possible.

Agree. I will delete it.

BenjaminBossan · 2024-08-27T12:34:02Z

src/peft/tuners/vblora/config.py

+            will not produce the same output as the base model would have without adaptation.
+        modules_to_save (`List[str]`):
+            List of modules apart from Vera layers to be set as trainable and saved in the final checkpoint.
+        init_weights (`bool`):


AFAICT, the init_weights argument is not used. I think it would be great if it was supported the same way as in LoRA: If set to True (default), VB-LoRA is initialized such that it performs and identity transform, and vice versa. Not sure if this fits well with the paper, we can also agree on a different default, but having that option would be appreciated.

In our approach, we shouldn't initialize the additive adaptor to zero because if the vector bank is initialized to zero, both matrices A and B will be zero, resulting in a zero gradient. As a result, we can't align with other methods regarding init_weights=True. Do you have any suggestions in this case?

I followed your first suggestion in the comment:

I removed init_weights.

In tests, I set the vector bank to zero to make VB-LoRA as an identity operation. And initialized the vector bank if training is involved in the test.

src/peft/tuners/vblora/layer.py

tests/test_vblora.py

BenjaminBossan · 2024-08-27T13:14:48Z

src/peft/tuners/vblora/model.py

+                f"Target module {target} is not supported. Currently, only the following modules are supported: "
+                "`torch.nn.Linear`, `transformers.pytorch_utils.Conv1D`."
+            )
+        r = kwargs.pop("r")


Why is this necessary?

I will delete this line.

src/peft/utils/save_and_load.py

src/peft/tuners/vblora/layer.py

BenjaminBossan

Thanks for your replies.

Regarding the discussion about init_weights, as it came up multiple times, I'll reply here:

From my understanding, the problem is that A and B matrices share the same vector bank. Therefore, if the vector bank is initialized to zeros, both A and B are zeros and therefore there is no gradient. This is unlike LoRA, where only B is zeros.
Also, it's not possible to set the logits for B to zero, as a softmax operation is performed, thus the weights will always sum up to 1.

I have three suggestions:

For those cases where we want VB-LoRA to be a no-op, we just manually call nn.init.ones_(model.base_model.vblora_vector_bank["default"]). The init_weights argument should be removed.
We add the option to pass init_weights=False (with True being default) and in that case initialize the vector bank to zeros. This model is untrainable, so we give a warning that this is option only exists for testing purposes.
Add a new parameter that scales the contribution from B, which is a learnable parameter that starts at 0. This way, the vector bank is randomly initialized but VB-LoRA still starts as a no-op. Of course, this would be a deviation from the paper.

I also have a general comment about the docs: Let's make sure to mention that there is an option for VB-LoRA to reduce the size of the file when saving, but that the loaded adapter can only be used for inference after that. Ideally, you could give a numerical example (e.g. for Llama3 8b with parameter ..., the size is reduced from X to Y).

BenjaminBossan · 2024-08-29T09:24:51Z

src/peft/tuners/vblora/layer.py

@@ -0,0 +1,263 @@
+# Copyright 2023-present the HuggingFace Inc. team.


Let's update the dates in all new files to 2024.

src/peft/tuners/vblora/layer.py

src/peft/utils/save_and_load.py

tests/test_vblora.py

leo-yangli · 2024-08-30T01:47:40Z

Thank you for your valuable feedback! I have updated the code accordingly.

I also updated the documentation in docs/source/package_reference/vblora.md. Please let me know if there is anything else I can improve.

HuggingFaceDocBuilderDev · 2024-08-30T09:21:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-08-30T09:53:36Z

Thanks for the updates @leo-yangli. I haven't reviewed yet but saw that the CI fails because of the following test:

test_parameters_after_loading_model_110_Vanilla_MLP_4_VBLoRA

Could you please check?

leo-yangli · 2024-08-30T14:51:02Z

Thank you for your feedback. I just fixed that failed test.

leo-yangli · 2024-09-01T14:37:52Z

It's ready to be reviewed.

BenjaminBossan

Thanks a lot for the updates. This is almost good to be merged, I only saw a few tiny issues left. Please take a look.

BenjaminBossan · 2024-09-02T12:23:50Z

examples/sequence_classification/VBLoRA.ipynb

+   "id": "ddfc0610-55f6-4343-a950-125ccf0f45ac",
+   "metadata": {},
+   "source": [
+    "In this example, we fine-tune Roberta on a sequence classification task using VB-LoRA."


Let's add a sentence that this notebook is based on examples/sequence_classficiation/VeRA.ipynb and users can compare the results.

BenjaminBossan · 2024-09-02T12:25:13Z

examples/sequence_classification/VBLoRA.ipynb

+    "    target_modules=['key', 'value', 'query', 'output.dense', 'intermediate.dense'],\n",
+    "    num_vectors=num_vectors,\n",
+    "    vector_length=vector_length,\n",
+    "    save_only_topk_weights=True,\n",


Let's add a comment here that to explain this argument. Using this also means that we cannot resume training from checkpoints if training is interrupted for some reason, right? Let's also mention that.

Added a comment.

BenjaminBossan · 2024-09-02T12:34:19Z

tests/test_initialization.py

+        with pytest.raises(ValueError, match=msg):
+            get_peft_model(model, config1)
+
+        config2 = VBLoRAConfig(target_modules=["lin1"], vector_length=vector_length)


Let's split this into 2 tests.

BenjaminBossan · 2024-09-02T12:36:59Z

src/peft/tuners/vblora/config.py

+            The length of the vectors in the vector bank. The length of the vectors should be divisible by the hidden
+            dimension of the model.
+        topk (`int`):
+            K value for topk selection.


It would be great if you could explain a bit more: What is the default, when should users decrease or increase this number? We can make the assumption that users are familiar with LoRA and would like to try VB-LoRA, they should not have to read the paper to understand the hyper-parameters.

Added some explanations.

BenjaminBossan · 2024-09-02T12:37:26Z

src/peft/tuners/vblora/config.py

+            the target modules manually.
+        save_only_topk_weights (`bool`):
+            Whether to only save the topk weights. Models saved in this mode can be used for merging or inference only,
+            not for resuming training.


Let's mention here that the point is to reduce the file size.

src/peft/tuners/vblora/layer.py

BenjaminBossan · 2024-09-02T12:46:29Z

src/peft/tuners/vblora/layer.py

+        # Check for infinity values when training. If found, training was likely resumed from a `save_only_topk_weights` model.
+        if self.training and vblora_logits_A[0, 0].isinf().any():
+            raise RuntimeError(
+                "Found infinity values in logits. Ensure training was not resumed from a `save_only_topk_weights` model."


Let's add: "VB-LoRA logits", otherwise this could be confusing.

tests/test_common_gpu.py

BenjaminBossan · 2024-09-02T13:07:51Z

Just something that came to my mind out of curiosity: Did you try experiments with multiple VB-LoRA adapters (for different tasks) that share the same vector bank? So e.g. train the logits and vector bank on task 1, then freeze the vector bank. Next, create a new VB-LoRA for task 2 that uses the frozen vector bank from task 1 and only trains logits? Probably this won't work well enough for task 2 but it would be really cool if it did to save even more parameters.

leo-yangli · 2024-09-02T22:13:01Z

Hi @BenjaminBossan ,

That's a great question! Training in an MTL setting is currently an area of ongoing research for us, and we have tried the exact same idea. Preliminary results indicate that when the two tasks are similar (for example, RTE and MRPC from the GLUE benchmark), the second task can still achieve decent performance with a frozen vector bank, although there is a noticeable drop in performance (e.g., 2%). This performance gap tends to decrease when we alternate training between tasks. However, a more robust training strategy for an MTL setting still needs to be explored. I hope this answered your question.

I updated the code based on your review. Moreover, I added a new feature (count VB-LoRA savable parameters) with tests to the code, and it is ready for review.

Thanks!

BenjaminBossan

Thanks for the latest updates. Also, good idea to add the methods to count the number of savable parameters.

I found a small issue not being fully addressed yet in a few tests, the rest looks good.

Btw. if you just push on top of the PR without rebase, it makes it much easier to review for me. There is no need for a clean git history, as I'll squash before merging.

Regarding the sharing of the vector bank: Thanks for the info. If the new research is finished and you found a nice way to share the vector bank, it would be nice to add this feature here. I think it shouldn't require many adjustments to implement.

BenjaminBossan · 2024-09-03T09:06:53Z

tests/test_vblora.py

+
+        assert os.path.exists(save_path / "adapter_config.json")
+
+        mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)


This is still not quite resolved. The mlp you used here is the same that you used above when calling mlp_vblora = get_peft_model(mlp, config). This means this is the mutated mlp, as get_peft_model will perform in-place operations on it. What I mean is that before calling PeftModel.from_pretrained(mlp, save_path), you should create a fresh instance of mlp. I typically also delete the old instance if it's no longer needed, just to make it clear.

Suggested change

mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)

del mlp

mlp = self.get_mlp()

mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)

The same should be done for the tests below.

leo-yangli · 2024-09-03T15:13:21Z

Thanks for pointing out the mistake. I have fixed it.

I've also added a new test, test_save_with_topk_weights, to test the weights saved in save_only_topk_weights=True mode are as expected.

BenjaminBossan

Thanks for this excellent addition to PEFT. Great PR, well implemented and tested right from the start.

I checked out the code and ran all tests on GPU, and they pass without trouble. Line coverage is also good, so this can be merged!

leo-yangli · 2024-09-04T13:23:13Z

Thank you @BenjaminBossan! The code quality has improved a lot thanks to your insightful review.

Implements "VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks" https://arxiv.org/abs/2405.15179

…oss_enabled` is set to True. (huggingface#2039) * fix: prevent unpackaging error due to additional **aux_loss** returned by **concatenated_forward** function when **aux_loss_enabled** is set to True. * Refactor: Simplify tuple unpacking in `concatenated_forward` call in `get_batch_loss_metrics` function * Refactor: improve code quality

BenjaminBossan requested changes Aug 27, 2024

View reviewed changes

BenjaminBossan reviewed Aug 29, 2024

View reviewed changes

leo-yangli force-pushed the main branch from 5c27c67 to 7273c74 Compare August 29, 2024 22:38

leo-yangli requested a review from BenjaminBossan August 30, 2024 01:48

leo-yangli force-pushed the main branch from a9ad1da to c7ac032 Compare August 30, 2024 14:45

leo-yangli force-pushed the main branch from 6dc36bb to 4b107cc Compare August 31, 2024 03:43

BenjaminBossan requested changes Sep 2, 2024

View reviewed changes

leo-yangli force-pushed the main branch from 630e04c to 630107a Compare September 2, 2024 21:29

leo added 12 commits September 2, 2024 17:39

VB-LoRA

dcffe54

minor

416fb57

modify based on reviews

cdb19fd

add doc

2a77875

doc

e10e0a6

test

5320c27

fixed a test issue, add doc

0c5a89c

doc

c01546a

test code minor

07aa6d0

doc minor

3d509fb

test & doc

f1a9856

add param count feature

d6619c0

leo-yangli force-pushed the main branch from 630107a to d6619c0 Compare September 2, 2024 21:39

BenjaminBossan requested changes Sep 3, 2024

View reviewed changes

fixed tests; added a new test test_save_with_topk_weights

c7ce5e2

format

c64342a

BenjaminBossan approved these changes Sep 4, 2024

View reviewed changes

BenjaminBossan merged commit c9f7240 into huggingface:main Sep 4, 2024
14 checks passed

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025

FEAT Add VB-LoRA (huggingface#2039)

012b788

Implements "VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks" https://arxiv.org/abs/2405.15179

		@@ -0,0 +1,263 @@
		# Copyright 2023-present the HuggingFace Inc. team.


		assert os.path.exists(save_path / "adapter_config.json")

		mlp_vblora_loaded = PeftModel.from_pretrained(mlp, save_path)

Add VB-LoRA #2039

Add VB-LoRA #2039

Uh oh!

Conversation

leo-yangli commented Aug 26, 2024

Uh oh!

review-notebook-app bot commented Aug 26, 2024

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leo-yangli commented Aug 30, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 30, 2024

Uh oh!

BenjaminBossan commented Aug 30, 2024

Uh oh!

leo-yangli commented Aug 30, 2024

Uh oh!

leo-yangli commented Sep 1, 2024

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment