Releases: huggingface/peft
0.17.1
This patch release contains a few fixes (via #2710) for the newly introduced target_parameters
feature, which allows LoRA to target nn.Parameter
s directly (useful for mixture of expert layers). Most notably:
- PEFT no longer removes possibly existing parametrizations from the parameter.
- Adding multiple adapters (via
model.add_adapter
ormodel.load_adapter
) did not work correctly. Since a solution is not trivial, PEFT now raises an error to prevent this situation.
0.17.0: SHiRA, MiSS, LoRA for MoE, and more
Highlights

New Methods
SHiRA
@kkb-code contributed Sparse High Rank Adapters (SHiRA, paper) which promise to offer a potential gain in performance over LoRAs - especially the concept loss when using multiple adapters is improved. Since the adapters only train on 1-2% of the weights and are inherently sparse, switching between adapters may be cheaper than with LoRAs. (#2584)
MiSS
@JL-er added a new PEFT method, MiSS (Matrix Shard Sharing) in #2604. This method is an evolution of Bone, which, according to our PEFT method comparison benchmark, gives excellent results when it comes to performance and memory efficiency. If you haven't tried it, you should do so now.
At the same time, Bone will be deprecated in favor of MiSS and will be removed in PEFT v0.19.0. If you already have a Bone checkpoint, you can use scripts/convert-bone-to-miss.py
to convert it into a MiSS checkpoint and proceed with training using MiSS.
Enhancements
LoRA for nn.Parameter
LoRA is now able to target nn.Parameter
directly (#2638, #2665)! Ever had this complicated nn.Module
with promising parameters inside but it was too custom to be supported by your favorite fine-tuning library? No worries, now you can target nn.Parameters
directly using the target_parameters
config attribute which works similarly to target_modules
.
This option can be especially useful for models with Mixture of Expert (MoE) layers, as those often use nn.Parameter
s directly and cannot be targeted with target_modules
. For example, for the Llama4 family of models, use the following config to target the MoE weights:
config = LoraConfig(
...,
target_modules=[], # <= prevent targeting any modules
target_parameters=["feed_forward.experts.down_proj", "feed_forward.experts.gate_up_proj"],
)
Note that this feature is still experimental as it comes with a few caveats and therefore might change in the future. Also, MoE weights with many experts can be quite huge, so expect a higher memory usage than compared to targeting normal nn.Linear
layers.
Injecting adapters based on a state_dict
Sometimes, it is possible that there is a PEFT adapter checkpoint but the corresponding PEFT config is not known for whatever reason. To inject the PEFT layers for this checkpoint, you would usually have to reverse-engineer the corresponding PEFT config, most notably the target_modules
argument, based on the state_dict
from the checkpoint. This can be cumbersome and error prone. To avoid this, it is also possible to call inject_adapter_in_model
and pass the loaded state_dict
as an argument:
from safetensors.torch import load_file
from peft import LoraConfig, inject_adapter_in_model
model = ...
state_dict = load_file(<path-to-safetensors-file>)
lora_config = LoraConfig() # <= no need to specify further
model = inject_adapter_in_model(lora_config, model, state_dict=state_dict)
Find more on state_dict
based injection in the docs.
Changes
Compatibility
A bug in prompt learning methods caused modules_to_save
to be ignored. Especially classification tasks are affected since they usually add the classification/score layer to modules_to_save
. In consequence, these layers were neither trained nor stored after training. This has been corrected now. (#2646)
All Changes
- Bump version to 0.16.1.dev0 after release by @BenjaminBossan in #2632
- FEAT: Add GH action to deploy method comparison app by @BenjaminBossan in #2625
- enable FSDP example for model `hugging-quants/Meta-Llama-3.1-8B-Instr… by @kaixuanliu in #2626
- FIX: Create mask function signature change in transformers 4.53.1 by @BenjaminBossan in #2633
- FIX: Correctly skip AWQ test based on torch version by @BenjaminBossan in #2631
- FIX: Faulty OFT parameter device test by @BenjaminBossan in #2630
- Fix #2634: Allow peft_type to be a string by @githubnemo in #2635
- SHiRA Adapters by @kkb-code in #2584
- FIX: Prompt learning methods modules_to_save issue by @BenjaminBossan in #2646
- FIX: Error in workflow file to deploy method comparison app by @BenjaminBossan in #2645
- FEAT Allow LoRA to target nn.Parameter by @BenjaminBossan in #2638
- Update BibTeX entry by @cx-alberto-simoes in #2659
- FIX Prefix tuning after transformers PR 38635 by @BenjaminBossan in #2662
- make method comparison device agnostic, so it can expand to more accelerators like XPU by @yao-matrix in #2610
- Update tokenizer parameter in sfttrainer across multiple examples by @gapsong in #2664
- Update lora.md by @qgallouedec in #2666
- GPT2 compatible version of LLama-Adapters by @efraimdahl in #2643
- Method Comparison: Improve formatting/layout of table by @githubnemo in #2670
- ENH: Targeting multiple parameters on the same module by @BenjaminBossan in #2665
- Update extending vocab docs by @githubnemo in #2669
- FIX Failing target_parameters param usage count by @BenjaminBossan in #2676
- Fix trainable tokens with fsdp by @BenjaminBossan in #2681
- FIX: Small fixes to target_parameters by @BenjaminBossan in #2677
- TST: Add more HF Hub model caching by @BenjaminBossan in #2682
- FIX: Missing device map for facebook/opt-125m by @BenjaminBossan in #2675
- Fix not detecting regex-targeted embedding layer by @githubnemo in #2649
- Add MiSS as a replacement for Bone. by @JL-er in #2604
- [WIP] ENH: Adapter injection based on state_dict by @BenjaminBossan in #2637
- Release 0.17.0 by @BenjaminBossan in #2691
New Contributors
- @kaixuanliu made their first contribution in #2626
- @kkb-code made their first contribution in #2584
- @cx-alberto-simoes made their first contribution in #2659
- @efraimdahl made their first contribution in #2643
Full Changelog: v0.16.0...v0.17.0
0.16.0: LoRA-FA, RandLoRA, C³A, and much more
Highlights
New Methods
LoRA-FA
In #2468, @AaronZLT added the LoRA-FA optimizer to PEFT. This optimizer is based on AdamW
and it increases memory efficiency of LoRA training. This means that you can train LoRA with less memory, or, with the same memory budget, use higher LoRA ranks, potentially getting better results.
RandLoRA
Thanks to @PaulAlbert31, a new PEFT method called RandLoRA
was added to PEFT (#2464). Similarly to VeRA, it uses non-learnable random low rank matrices that are combined through learnable matrices. This way, RandLoRA can approximate full rank updates of the weights. Training models quantized with bitsandbytes is supported.
C³A
@Phoveran added Circular Convolution Adaptation, C3A, in #2577. This new PEFT method can overcome the limit of low rank adaptations as seen e.g. in LoRA while still promising to be fast and memory efficient.
Enhancements
Thanks to @gslama12 and @SP1029, LoRA now supports Conv2d
layers with groups != 1
. This requires the rank r
being divisible by groups
. See #2403 and #2567 for context.
@dsocek added support for Intel Neural Compressor (INC) quantization to LoRA in #2499.
DoRA now supports Conv1d
layers thanks to @EskildAndersen (#2531).
Passing init_lora_weights="orthogonal"
now enables orthogonal weight initialization for LoRA (#2498).
@gapsong brought us Quantization-Aware LoRA training in #2571. This can make QLoRA training more efficient, please check the included example. Right now, only GPTQ is supported.
There has been a big refactor of Orthogonal Finetuning, OFT, thanks to @zqiu24 (#2575). This makes the PEFT method run more quickly and require less memory. It is, however, incompatible with old OFT checkpoints. If you have old OFT checkpoints, either pin the PEFT version to <0.16.0
or retrain it with the new PEFT version.
Thanks to @keepdying, LoRA hotswapping with compiled models no longer leads to CUDA graph re-records (#2611).
Changes
Compatibility
- #2481: The value of
required_grads_
ofmodules_to_save
is now set toTrue
when used directly withinject_adapter
. This is relevant for PEFT integrations, e.g. Transformers or Diffusers. - Due to a big refactor of vision language models (VLMs) in Transformers, the model architecture has been slightly adjusted. One consequence of this is that if you use a PEFT prompt learning method that is applied to
vlm.language_model
, it will no longer work, please apply it tovlm
directly (see #2554 for context). Morever, the refactor results in different checkpoints. We managed to ensure backwards compatability in PEFT, i.e. old checkpoints can be loaded successfully. There is, however, no forward compatibility, i.e. loading checkpoints trained after the refactor is not possible with package versions from before the refactor. In this case, you need to upgrade PEFT and transformers. More context in #2574. - #2579: There have been bigger refactors in Transformers concerning attention masks. This required some changes on the PEFT side which can affect prompt learning methods. For prefix tuning specifically, this can result in numerical differences but overall performance should be the same. For other prompt learning methods, numerical values should be the same, except if the base model uses 4d attention masks, like Gemma. If you load old prompt learning checkpoints, please double-check that they still perform as expected, especially if they're trained on Gemma or similar models. If not, please re-train them or pin PEFT and transformers to previous versions (
<0.16.0
and<4.52.0
, respectively).
All Changes
- Bump version and minor instruction fix by @githubnemo in #2439
- FIX for ConvNd layers using the groups argument. by @gslama12 in #2403
- DOC: Tip on how to merge with DeepSpeed by @BenjaminBossan in #2446
- Fix incorrect link in docs by @kenning in #2444
- Fix typos by @omahs in #2447
- Refactor to better support LoRA variants by @BenjaminBossan in #2443
- enable 5 test cases on XPU by @yao-matrix in #2442
- FIX: Faulty test that results in nan weights by @BenjaminBossan in #2448
- Fix sft example script trl and env var by @BenjaminBossan in #2454
- LoRA variant init now also receives kwargs by @BenjaminBossan in #2455
- Fix #2450: Revamp adapter_state_dict_* methods by @githubnemo in #2456
- Method comparison evaluation suite by @githubnemo in #2395
- Bump version to reflect patch release by @githubnemo in #2461
- The paper on the Bone structure has been updated by @JL-er in #2312
- CI: More caching in tests by @BenjaminBossan in #2472
- fix gpu tests by @jiqing-feng in #2471
- Fix compare results by @jiqing-feng in #2473
- fix error_factor for xpu by @jiqing-feng in #2475
- Fix: Multiple PEFT methods have issues with models loaded in float16 or bfloat16 by @BenjaminBossan in #2433
- TST Refactor tests to make them simpler by @BenjaminBossan in #2462
- Use Python 3.9 as RUFF target version and apply fixes by @cyyever in #2483
- FIX Deleting adapters on auxiliary modules by @BenjaminBossan in #2466
- fix args by @real-zhangzhe in #2474
- ENH Add default target_modules for Llama4 by @BenjaminBossan in #2480
- [Feature Request] Add LoRA-FA to PEFT by @AaronZLT in #2468
- TST Refactor (continued) of encoder tests by @BenjaminBossan in #2478
- FIX: Error when merging LoRA bias with scale != 1 by @BenjaminBossan in #2489
- FIX: X-LoRA error when targeting different modules by @BenjaminBossan in #2488
- Fix: the evaluation_strategy is deprecated by @yuanwu2017 in #2487
- Testing common uses situational HF_HUB_OFFLINE by @githubnemo in #2490
- MNT: Update HF Hub download kwargs by @BenjaminBossan in #2492
- FIX Multi GPU tests: explicit device map by @BenjaminBossan in #2484
- Fix #2477: Regression accessing
modules_to_save
by @githubnemo in #2481 - make test_lora_use_dora_linear pass on XPU by @yao-matrix in #2493
- TST: AQLM test no longer x-fails by @BenjaminBossan in #2506
- TST make 3 flaky test cases always pass on XPU by @yao-matrix in #2503
- FIX: CPT should not be tested with sequence classification by @BenjaminBossan in #2507
- Update Docker image builds for torch 2.7+cu126 by @matthewdouglas in #2514
- Feature: RandLora integration into peft by @PaulAlbert31 in #2464
- LORA/MODEL: Use max rank of pattern for
add_weighted_adapter
by @Beinsezii in #2512 - fix typo for skipping test by @jiqing-feng in #2519
- docs typo: fix links by @imba-tjd in #2517
- Add INC dispatcher by @dsocek in #2499
- ENH: Add default Qwen3 target modules by @BenjaminBossan in #2522
- MNT: Pin GitHub action hashes for security by @BenjaminBossan in #2521
- TST: Refactor remaining common tests to use pytest by @BenjaminBossan in #2491
- ENH: Add tests, docs, types for scaling methods by @BenjaminBossan in #2526
- TST Mark AutoAWQ as xfail for now by @BenjaminBossan in #2529
- FIX Prompt learning issue with 4d attention mask by @BenjaminBossan in #2458
- FIX: Use correct argument name in MultiheadAttention forward by @BenjaminBossan in #2510
- Method comparison: Support more options for the optimizer by @BenjaminBossan in #2479
- Randlora documentation and some example usage by @PaulAlbert31 in #2524
- added support for Conv1d for DoRA by @EskildAndersen in #2531
- Fix #2535: Prev...
v0.15.2
v0.15.1
This patch includes a fix for #2450. In this bug modules_to_save
was not handled correctly when used in conjunction with DeepSpeed ZeRO stage 3 which resulted in those modules being placeholder values in the saved checkpoints.
Full Changelog: v0.15.0...v0.15.1
v0.15.0
Highlights
New Methods
CorDA: Context-Oriented Decomposition Adaptation
@iboing and @5eqn contributed CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning . This task-driven initialization method has two modes, knowledge-preservation and instruction-preservation, both using external data to select ranks intelligently. The former can be used to select those ranks that correspond to weights not affiliated with knowledge from, say, a QA dataset. The latter can be used to select those ranks that correspond most to the task at hand (e.g., a classification task). (#2231)
Trainable Tokens: Selective token update
The new Trainable Tokens tuner allows for selective training of tokens without re-training the full embedding matrix, e.g. when adding support for reasoning / thinking tokens. This is a lot more memory efficient and the saved checkpoint is much smaller. It can be used standalone or in conjunction with LoRA adapters by passing trainable_token_indices
to LoraConfig
. (#2376)
Enhancements
LoRA now supports targeting multihead attention modules (but for now only those with _qkv_same_embed_dim=True
). These modules were tricky as they may expose linear submodules but won't use their forward methods, therefore needing explicit support. (#1324)
Hotswapping now allows different alpha scalings and ranks without recompilation of the model when the model is prepared using a call to prepare_model_for_compiled_hotswap()
before compiling the model. (#2177)
GPTQModel support was added in #2247 as a replacement for AutoGPTQ which is not maintained anymore.
Changes
- It's now possible to use
all-linear
astarget_modules
for custom (non-transformers) models (#2267). With this change comes a bugfix where it was possible that non-linear layers were selected when they shared the same name with a linear layer (e.g.,bar.foo
andbaz.foo
). - The internal tuner API was refactored to make method registration easier. With this change the number of changes to numerous files is reduced to a single
register_peft_method()
call. (#2282) PEFT_TYPE_TO_MODEL_MAPPING
is now deprecated and should not be relied upon. UsePEFT_TYPE_TO_TUNER_MAPPING
instead. (#2282)- Mixed adapter batches can now be used in conjunction with beam search. (#2287)
- It was possible that
modules_to_save
keys wrongly matched parts of the state dict if the key was a substring of another key (e.g.,classifier
andclassifier2
). (#2334) - Auto-casting of the input dtype to the LoRA adapter dtype can now be disabled via
disable_input_dtype_casting=True
. (#2353) - The config parameters
rank_pattern
andalpha_pattern
used by many adapters now supports matching full paths as well by specifying the pattern with a caret in front, for example:^foo
to targetmodel.foo
but notmodel.bar.foo
. (#2419) - AutoPeftModels do not reduce the embedding size anymore if the tokenizer size differs from the embedding size. Only if there are more tokens in the tokenizer than in the embedding matrix, the matrix will be resized. This is to prevent resizing of embedding matrices in models that have 'spare' tokens built-in. (#2427)
What's Changed
- FIX: Ensure Device Compatibility for BOFT Forward/Merging by @d-kleine in #2242
- MNT: Bump version to 0.14.1.dev0 by @BenjaminBossan in #2263
- ENH: fix library interface by @bluenote10 in #2265
- FIX: Add warning for
adapter_name
conflict with tuner by @pzdkn in #2254 - ENH: FIX: Allow
"all-linear"
to target custom models by @BenjaminBossan in #2267 - MNT: apply sorting of exported symbols in
__all__
by @bluenote10 in #2280 - MNT: apply sorting of imports by @bluenote10 in #2279
- FIX: Adoption prompt: New way to obtain position embeddings by @BenjaminBossan in #2276
- FIX: Int8 check for torchao v0.7.0 by @BenjaminBossan in #2284
- FEAT: Adding CorDA as an optional initialization method of LoRA by @iboing in #2231
- FIX: typo in lora
config.py
by @innerlee in #2297 - DOC: Added information regarding freezing the base model in
prepare_model_for_kbit_training
docstring by @NilBiescas in #2305 - DOC: add
resize_token_embeddings
to docs by @bingwork in #2290 - FIX: Make CorDA example work by @5eqn in #2300
- FIX: #2295: Warn when user reloads modified model by @githubnemo in #2306
- ENH: Extend usage for OLoRA finetune script by @jiqing-feng in #2308
- CI: Add zizmor for CI (security) linting by @githubnemo in #2288
- FEAT: Add LoRA multihead attention module by @BenjaminBossan in #1324
- DOC: Updated documentation for
get_peft_model()
for in-place base model modification by @d-kleine in #2313 - FIX: Prefix tuning test w/ rotary embedding on multi GPU by @BenjaminBossan in #2311
- FIX: Adaption prompt errors after changes from transformers #35235 by @BenjaminBossan in #2314
- FIX: Package checks for torchao, EETQ by @BenjaminBossan in #2320
- Refactor: PEFT method registration function by @BenjaminBossan in #2282
- FIX:
low_cpu_mem_usage=True
with 8bit bitsandbytes by @BenjaminBossan in #2325 - FIX: Reinstate
PEFT_TYPE_TO_MODEL_MAPPING
variable with deprecation by @BenjaminBossan in #2328 - FIX: reduce CorDA memory consumption + docs by @5eqn in #2324
- MNT: React on new zizmor version findings by @githubnemo in #2331
- TST: make cuda-only tests device-agnostic by @faaany in #2323
- FIX: Generating with mixed adapter batches and with beam search enabled by @BenjaminBossan in #2287
- FIX: Bug with
modules_to_save
loading if substring by @BenjaminBossan in #2334 - FIX: Add missing attributes to MultiheadAttention by @BenjaminBossan in #2335
- FIX: for zizmor permission warnings by @githubnemo in #2338
- CI: Attempt at adding a cache for models by @githubnemo in #2327
- FIX: Avoid needless copy from
modules_to_save
by @BenjaminBossan in #2220 - DOC: Add entry to solve unknown config argument by @BenjaminBossan in #2340
- FEAT: add gptqmodel support by @jiqing-feng in #2247
- MNT: Update ruff to v0.9.2 by @BenjaminBossan in #2343
- TST: Update
torch.compile
tests and docs by @BenjaminBossan in #2332 - FIX: Documentation & error checking for AdaLoRA timing by @githubnemo in #2341
- DOC: Better document init_lora_weights=False option by @BenjaminBossan in #2347
- ENH: Adding Lora implementation for
nn.Conv1d
by @CCLDArjun in #2333 - FIX: Failing AdaLoRA GPU test by @BenjaminBossan in #2349
- ENH: Improve invalid peft config error message by @thedebugger in #2346
- TST: Use different diffusion model for testing by @BenjaminBossan in #2345
- CI: Use locked install for zizmor by @githubnemo in #2350
- DOC: fix links to PEFT guides by @makelinux in #2357
- DOC: rename link to PEFT Quicktour by @makelinux in #2358
- ENH: Allow disabling input dtype casting for LoRA by @BenjaminBossan in #2353
- ENH: Hotswap allow different alpha scalings and ranks by @BenjaminBossan in #2177
- DOC: Fix links to boft by @makelinux in #2365
- DOC: Explain uninitialized weights warning by @BenjaminBossan in #2369
- ENH: Optimization for ConvNd if dropout=0. by @gslama12 in #2371
- FIX: Small fixes to hotswapping by @BenjaminBossan in #2366
- ENH:
prepare_model_for_compiled_hotswap
raises when no adapter was found by @BenjaminBossan in https://github.com/hugging...
Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more
Highlights
New Methods
Context-aware Prompt Tuning
@tsachiblau added a new soft prompt method called Context-aware Prompt Tuning (CPT) which is a combination of In-Context Learning and Prompt Tuning in the sense that, for each training sample, it builds a learnable context from training examples in addition to the single training sample. Allows for sample- and parameter-efficient few-shot classification and addresses recency-bias.
Explained Variance Adaptation
@sirluk contributed a new LoRA initialization method called Explained Variance Adaptation (EVA). Instead of randomly initializing LoRA weights, this method uses SVD on minibatches of finetuning data to initialize the LoRA weights and is also able to re-allocate the ranks of the adapter based on the explained variance ratio (derived from SVD). Thus, this initialization method can yield better initial values and better rank distribution.
Bone
@JL-er added an implementation for Block Affine (Bone) Adaptation which utilizes presumed sparsity in the base layer weights to divide them into multiple sub-spaces that share a single low-rank matrix for updates. Compared to LoRA, Bone has the potential to significantly reduce memory usage and achieve faster computation.
Enhancements
PEFT now supports LoRAs for int8
torchao quantized models (check this and this notebook) . In addition, VeRA can now be used with 4 and 8 bit bitsandbytes quantization thanks to @ZiadHelal.
Hot-swapping of LoRA adapters is now possible using the hotswap_adapter
function. Now you are able to load one LoRA and replace its weights in-place with the LoRA weights of another adapter which, in general, should be faster than deleting one adapter and loading the other adapter in its place. The feature is built so that no re-compilation of the model is necessary if torch.compile
was called on the model (right now, this requires ranks and alphas to be the same for the adapters).
LoRA and IA³ now support Conv3d
layers thanks to @jsilter, and @JINO-ROHIT added a notebook showcasing PEFT model evaluation using lm-eval-harness toolkit.
With the target_modules
argument, you can specify which layers to target with the adapter (e.g. LoRA). Now you can also specify which modules not to target by using the exclude_modules
parameter (thanks @JINO-ROHIT).
Changes
- There have been made several fixes to the OFT implementation, among other things, to fix merging, which makes adapter weights trained with PEFT versions prior to this release incompatible (see #1996 for details).
- Adapter configs are now forward-compatible by accepting unknown keys.
- Prefix tuning was fitted to the
DynamicCache
caching infrastructure of transformers (see #2096). If you are using this PEFT version and a recent version of transformers with an old prefix tuning checkpoint, you should double check that it still works correctly and retrain it if it doesn't. - Added
lora_bias
parameter to LoRA layers to enable bias on LoRA B matrix. This is useful when extracting LoRA weights from fully fine-tuned parameters with bias vectors so that these can be taken into account. - #2180 provided a couple of bug fixes to LoKr (thanks @yaswanth19). If you're using LoKr, your old checkpoints should still work but it's recommended to retrain your adapter.
from_pretrained
now warns the user if PEFT keys are missing.- Attribute access to modules in
modules_to_save
is now properly and transparently handled. - PEFT supports the changes to bitsandbytes 8bit quantization from the recent v0.45.0 release. To benefit from these improvements, we thus recommend to upgrade bitsandbytes if you're using QLoRA. Expect slight numerical differences in model outputs if you're using QLoRA with 8bit bitsandbytes quantization.
What's Changed
- Bump version to 0.13.1.dev0 by @BenjaminBossan in #2094
- Support Conv3d layer in LoRA and IA3 by @jsilter in #2082
- Fix Inconsistent Missing Keys Warning for Adapter Weights in PEFT by @yaswanth19 in #2084
- FIX: Change check if past_key_values is empty by @BenjaminBossan in #2106
- Update install.md by @Salehbigdeli in #2110
- Update OFT to fix merge bugs by @Zeju1997 in #1996
- ENH: Improved attribute access for modules_to_save by @BenjaminBossan in #2117
- FIX low_cpu_mem_usage consolidates devices by @BenjaminBossan in #2113
- TST Mark flaky X-LoRA test as xfail by @BenjaminBossan in #2114
- ENH: Warn when from_pretrained misses PEFT keys by @BenjaminBossan in #2118
- FEAT: Adding exclude modules param(#2044) by @JINO-ROHIT in #2102
- fix merging bug / update boft conv2d scaling variable by @Zeju1997 in #2127
- FEAT: Support quantization for VeRA using bitsandbytes (#2070) by @ZiadHelal in #2076
- Bump version to 0.13.2.dev0 by @BenjaminBossan in #2137
- FEAT: Support torchao by @BenjaminBossan in #2062
- FIX: Transpose weight matrix based on fan_in_fan_out condition in PiSSA initialization (#2103) by @suyang160 in #2104
- FIX Type annoations in vera/bnb.py by @BenjaminBossan in #2139
- ENH Make PEFT configs forward compatible by @BenjaminBossan in #2038
- FIX Raise an error when performing mixed adapter inference and passing non-existing adapter names by @BenjaminBossan in #2090
- FIX Prompt learning with latest transformers error by @BenjaminBossan in #2140
- adding peft lora example notebook for ner by @JINO-ROHIT in #2126
- FIX TST: NaN issue with HQQ GPU test by @BenjaminBossan in #2143
- FIX: Bug in target module optimization if child module name is suffix of parent module name by @BenjaminBossan in #2144
- Bump version to 0.13.2.dev0 by @BenjaminBossan in #2145
- FIX Don't assume past_key_valus for encoder models by @BenjaminBossan in #2149
- Use
SFTConfig
instead ofSFTTrainer
keyword args by @qgallouedec in #2150 - FIX: Sft train script FSDP QLoRA embedding mean resizing error by @BenjaminBossan in #2151
- Optimize DoRA in
eval
andno dropout
by @ariG23498 in #2122 - FIX Missing low_cpu_mem_usage argument by @BenjaminBossan in #2156
- MNT: Remove version pin of diffusers by @BenjaminBossan in #2162
- DOC: Improve docs for layers_pattern argument by @BenjaminBossan in #2157
- Update HRA by @DaShenZi721 in #2160
- fix fsdp_auto_wrap_policy by @eljandoubi in #2167
- MNT Remove Python 3.8 since it's end of life by @BenjaminBossan in #2135
- Improving error message when users pass layers_to_transform and layers_pattern by @JINO-ROHIT in #2169
- FEAT Add hotswapping functionality by @BenjaminBossan in #2120
- Fix to prefix tuning to fit transformers by @BenjaminBossan in #2096
- MNT: Enable Python 3.12 on CI by @BenjaminBossan in #2173
- MNT: Update docker nvidia base image to 12.4.1 by @BenjaminBossan in #2176
- DOC: Extend modules_to_save doc with pooler example by @BenjaminBossan in #2175
- FIX VeRA failure on multiple GPUs by @BenjaminBossan in #2163
- FIX: Import location of HF hub errors by @BenjaminBossan in #2178
- DOC: fix broken link in the README of loftq by @dennis2030 in #2183
- added checks for layers to transforms and layer pattern in lora by @JINO-ROHIT in #2159
- ENH: Warn when loading PiSSA/OLoRA together with other adapters by @BenjaminBossan in #2186
- TST: Skip AQLM test that is incompatible with torch 2.5 by @BenjaminBossan in #2187
- FIX: Prefix...
v0.13.2: Small patch release
This patch release contains a small bug fix for an issue that prevented some LoRA checkpoints to be loaded correctly (mostly concerning stable diffusion checkpoints not trained with PEFT when loaded in diffusers, #2144).
Full Changelog: v0.13.1...v0.13.2
v0.13.1: Small patch release
This patch release contains a small bug fix for the low_cpu_mem_usage=True
option (#2113).
Full Changelog: v0.13.0...v0.13.1
v0.13.0: LoRA+, VB-LoRA, and more
Highlights
New methods
LoRA+
@kallewoof added LoRA+ to PEFT (#1915). This is a function that allows to initialize an optimizer with settings that are better suited for training a LoRA adapter.
VB-LoRA
@leo-yangli added a new method to PEFT called VB-LoRA (#2039). The idea is to have LoRA layers be composed from a single vector bank (hence "VB") that is shared among all layers. This makes VB-LoRA extremely parameter efficient and the checkpoints especially small (comparable to the VeRA method), while still promising good fine-tuning performance. Check the VB-LoRA docs and example.
Enhancements
New Hugging Face team member @ariG23498 added the helper function rescale_adapter_scale
to PEFT (#1951). Use this context manager to temporarily increase or decrease the scaling of the LoRA adapter of a model. It also works for PEFT adapters loaded directly into a transformers or diffusers model.
@ariG23498 also added DoRA support for embedding layers (#2006). So if you're using the use_dora=True
option in the LoraConfig
, you can now also target embedding layers.
For some time now, we support inference with batches that are using different adapters for different samples, so e.g. sample 1-5 use "adapter1" and samples 6-10 use "adapter2". However, this only worked for LoRA layers so far. @saeid93 extended this to also work with layers targeted by modules_to_save
(#1990).
When loading a PEFT adapter, you now have the option to pass low_cpu_mem_usage=True
(#1961). This will initialize the adapter with empty weights ("meta" device) before loading the weights instead of initializing on CPU or GPU. This can speed up loading PEFT adapters. So use this option especially if you have a lot of adapters to load at the same time or if these adapters are very big. Please let us know if you encounter issues with this option, as we may make this the default in the future.
Changes
Safe loading of PyTorch weights
Unless indicated otherwise, PEFT adapters are saved and loaded using the secure safetensors
format. However, we also support the PyTorch format for checkpoints, which relies on the inherently insecure pickle protocol from Python. In the future, PyTorch will be more strict when loading these files to improve security by making the option weights_only=True
the default. This is generally recommended and should not cause any trouble with PEFT checkpoints, which is why with this release, PEFT will enable this by default. Please open an issue if this causes trouble.
What's Changed
- Bump version to 0.12.1.dev0 by @BenjaminBossan in #1950
- CI Fix Windows permission error on merge test by @BenjaminBossan in #1952
- Check if past_key_values is provided when using prefix_tuning in peft_model by @Nidhogg-lyz in #1942
- Add lora+ implementation by @kallewoof in #1915
- FIX: New bloom changes breaking prompt learning by @BenjaminBossan in #1969
- ENH Update VeRA preconfigured models by @BenjaminBossan in #1941
- fix: lora+: include lr in optimizer kwargs by @kallewoof in #1973
- FIX active_adapters for transformers models by @BenjaminBossan in #1975
- FIX Loading adapter honors offline mode by @BenjaminBossan in #1976
- chore: Update CI configuration for workflows by @XciD in #1985
- Cast to fp32 if using bf16 weights on cpu during
merge_and_unload
by @snarayan21 in #1978 - AdaLora: Trigger warning when user uses 'r' inplace of 'init_r' by @bhargavyagnik in #1981
- [Add] scaling LoRA adapter weights with a context manager by @ariG23498 in #1951
- DOC Small fixes for HQQ and section title by @BenjaminBossan in #1986
- Add docs and examples for X-LoRA by @EricLBuehler in #1970
- fix: fix docker build gpus by @XciD in #1987
- FIX: Adjust transformers version check for bloom by @BenjaminBossan in #1992
- [Hotfix] Fix BOFT mixed precision by @Edenzzzz in #1925
- [Suggestions] Updates suggested for
helper.rescale_adapter_scale
by @ariG23498 in #1989 - MAINT: Default to loading weights only for torch.load by @BenjaminBossan in #1993
- BOFT bug fix when saving by @Zeju1997 in #1994
- FIX Import error in BOFT half precision test by @BenjaminBossan in #1995
- Update lora.md (typos) by @nir-sh-automat-it in #2003
- TST Add LNTuningConfig and LoKrConfig to tests by @BenjaminBossan in #2005
- ENH: Warn when a user provided model name in the config renamed by @BenjaminBossan in #2004
- FIX CI Correctly report outcome of bnb import test by @BenjaminBossan in #2007
- Update docs for X-LoRA and some bugfixes by @EricLBuehler in #2002
- TST: Potentially Skip 8bit bnb regression test if compute capability is too low by @BenjaminBossan in #1998
- CI Activate single core multi backend bnb tests by @BenjaminBossan in #2008
- Fix usage of deprecated parameters/functions in X-LoRA by @EricLBuehler in #2010
- [tests] enable
test_vera_dtypes
on XPU by @faaany in #2017 - CI Remove regression tests from BNB CI by @BenjaminBossan in #2024
- [tests] enable regression tests on XPU by @faaany in #2019
- ENH: Better error msg for replace_lora_weights_loftq when using a local model. by @BenjaminBossan in #2022
- [tests] make cuda-only cases in
TestModelAndLayerStatus
device-agnostic by @faaany in #2026 - [tests] enable
test_mixed_adapter_batches_lora_opt_timing
on XPU by @faaany in #2021 - MAINT: Update ruff version to ~0.6.1 by @BenjaminBossan in #1965
- ENH Raise error when applying modules_to_save on tuner layer by @BenjaminBossan in #2028
- FIX: Don't target the classification head when using target_modules="all-linear" by @BenjaminBossan in #2033
- [tests] enable cuda-only tests in
test_common_gpu.py
to work on XPU by @faaany in #2031 - [Add] DoRA Embedding by @ariG23498 in #2006
- [tests] enable
test_gpu_examples.py
on XPU by @faaany in #2036 - Bug: set correct pre-commit-hooks version by @ltoniazzi in #2034
- Warn if using tied target module with
tie_word_embeddings
by @ltoniazzi in #2025 - ENH: Faster adapter loading if there are a lot of target modules by @BenjaminBossan in #2045
- FIX: Error with OLoRA init when using bnb by @BenjaminBossan in #2011
- FIX: Small numerical discrepancy for p-tuning after loading the model by @BenjaminBossan in #2047
- Add VB-LoRA by @leo-yangli in #2039
- Fixing scalings logging test by @EricLBuehler in #2042
- TST: Fewer inference steps for stable diffusion tests by @BenjaminBossan in #2051
- TST Speed up vision model tests by @BenjaminBossan in #2058
- TST: Make X-LoRA tests faster by @BenjaminBossan in #2059
- Update permissions for githubtoken stale.yml by @glegendre01 in #2061
- MAINT: Give stale bot permissions for PRs too by @BenjaminBossan in #2064
- avoid saving boft_P in adapter model by @sywangyi in #2050
- fix arguments for PiSSA preprocess by @keakon in #2053
- Apply deprecated
evaluation_strategy
by @muellerzr in #1664 - fixing multiple LoRA in the same batch or vit by @saeid93 in https://gi...