Skip to content

Conversation

@wingding12
Copy link

Summary

Fixes #3004

When using add_weighted_adapter with combination_type in ['linear', 'ties', 'dare_linear', 'dare_ties', 'magnitude_prune'], the previous implementation combined A matrices and B matrices separately before multiplying. This introduced cross-terms that corrupted the result when combining multiple adapters with different weights.

The Bug

For example, with two adapters (A1, B1) and (A2, B2) and weights [1, -1]:

  • Expected: A1@B1 - A2@B2
  • Old (buggy): (A1-A2)@(B1+B2) = A1@B1 + A1@B2 - A2@B1 - A2@B2

The cross-terms A1@B2 and A2@B1 were incorrectly added, causing significant errors (in my tests, ~1.55 error norm vs ~1.55 expected norm - essentially completely wrong).

The Fix

This PR computes the full delta weight (A@B) for each adapter first, then combines them, and finally decomposes the result back to A and B using SVD. This ensures mathematically correct results.

Note on Approximation

The decomposition uses truncated SVD with the same rank as the input adapters, which introduces some approximation error. This is the same behavior as the existing SVD combination types (svd, ties_svd, etc.) and is expected.

In my tests:

  • Old buggy error: ~1.55 (completely wrong)
  • New fixed error (rank=4): ~0.32 (expected rank truncation, same as combination_type='svd')
  • New fixed error (rank=8): ~0.00001 (near exact with sufficient rank)

Users who need higher accuracy can use combination_type='svd' with a higher svd_rank parameter.

Test Plan

I tested manually with:

  1. Single adapter with negative weight (-1.0) - works correctly
  2. Two identical adapters with [1.0, -1.0] - cancels to zero correctly
  3. Two different adapters with [1.0, -1.0] - now matches expected result (within SVD truncation error)
  4. Three adapters with mixed weights [2.0, -1.0, 0.5] - works correctly
  5. All combination types (linear, ties, dare_linear, dare_ties, magnitude_prune) - all work

Also includes Conv2d layer handling (following the same pattern as the existing SVD implementation).

… weights

When using `add_weighted_adapter` with combination_type in ['linear', 'ties',
'dare_linear', 'dare_ties', 'magnitude_prune'], the previous implementation
combined A matrices and B matrices separately before multiplying. This
introduced cross-terms that corrupted the result when combining multiple
adapters with different weights.

For example, with two adapters (A1, B1) and (A2, B2) and weights [1, -1]:
- Expected: A1@B1 - A2@B2
- Old (buggy): (A1-A2)@(B1+B2) = A1@B1 + A1@B2 - A2@B1 - A2@B2

The cross-terms A1@B2 and A2@B1 were incorrectly added.

This fix computes the full delta weight (A@B) for each adapter first, then
combines them, and finally decomposes the result back to A and B using SVD.
This ensures mathematically correct results.

Note: The decomposition uses truncated SVD with the same rank as the input
adapters, which introduces some approximation error. This is the same behavior
as the existing SVD combination types and is expected. Users who need higher
accuracy can use combination_type='svd' with a higher svd_rank.

Fixes huggingface#3004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combining adapters linearly with negative weights is broken

1 participant