Skip to content

Commit 781bdaf

Browse files
FIX Small regression in BNB LoRA output
Our regression tests reveal that the 8bit LoRA BNB regression test is failing. To reproduce, run: pytest tests/regression/test_regression.py -s --regression -k test_lora_8bit The regression was introduced in huggingface#2122. We didn't notice this earlier because of other failing tests in the nightly CI. The cause of the error is subtle. In the original code, we would calculate the LoRA output, convert the dtype if necessary, then add it to the base output. After the mentioned PR, we calculate the LoRA output, add it to the base output, then convert the dtype if necessary. The difference is very small on a per layer basis, but it can accumulate over the layers, leading to a significant difference in outputs, as witnessed by the regression test. This PR rolls back this specific part of the PR (both for 8bit and 4bit) while leaving the main change of that PR intact.
1 parent ca1b3b1 commit 781bdaf

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

src/peft/tuners/lora/bnb.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -235,15 +235,15 @@ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
235235
x = x.to(compute_dtype)
236236

237237
if not self.use_dora[active_adapter]:
238-
result = result + lora_B(lora_A(dropout(x))) * scaling
238+
output = lora_B(lora_A(dropout(x))) * scaling
239239
else:
240240
if isinstance(dropout, torch.nn.Identity) or not self.training:
241241
base_result = result
242242
else:
243243
x = dropout(x)
244244
base_result = None
245245

246-
result = result + self.lora_magnitude_vector[active_adapter](
246+
output = self.lora_magnitude_vector[active_adapter](
247247
x,
248248
lora_A=lora_A,
249249
lora_B=lora_B,
@@ -252,7 +252,8 @@ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
252252
base_result=base_result,
253253
)
254254
if requires_conversion:
255-
result = result.to(expected_dtype)
255+
output = output.to(expected_dtype)
256+
result = result + output
256257

257258
return result
258259

@@ -490,15 +491,15 @@ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
490491
x = x.to(lora_A.weight.dtype)
491492

492493
if not self.use_dora[active_adapter]:
493-
result = result + lora_B(lora_A(dropout(x))) * scaling
494+
output = lora_B(lora_A(dropout(x))) * scaling
494495
else:
495496
if isinstance(dropout, torch.nn.Identity) or not self.training:
496497
base_result = result
497498
else:
498499
x = dropout(x)
499500
base_result = None
500501

501-
result = result + self.lora_magnitude_vector[active_adapter](
502+
output = self.lora_magnitude_vector[active_adapter](
502503
x,
503504
lora_A=lora_A,
504505
lora_B=lora_B,
@@ -507,7 +508,8 @@ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
507508
base_result=base_result,
508509
)
509510
if requires_conversion:
510-
result = result.to(expected_dtype)
511+
output = output.to(expected_dtype)
512+
result = result + output
511513

512514
return result
513515

0 commit comments

Comments
 (0)