Skip to content

Commit daf8707

Browse files
authored
Merge branch 'main' into feature_fourierft_conv2d
2 parents 28e7ad1 + 6030f91 commit daf8707

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2632
-4269
lines changed

.github/workflows/tests-main.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,6 @@ on:
66
paths-ignore:
77
- 'docs/**'
88

9-
env:
10-
TRANSFORMERS_IS_CI: 1
11-
129
permissions: {}
1310

1411
jobs:
@@ -31,6 +28,9 @@ jobs:
3128
pip install -U git+https://github.com/huggingface/transformers.git
3229
pip install -e .[test]
3330
- name: Test with pytest
31+
env:
32+
TRANSFORMERS_IS_CI: 1
33+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
3434
run: |
3535
make test
3636
- name: Post to Slack

.github/workflows/tests.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ on:
1111

1212
env:
1313
HF_HOME: .cache/huggingface
14-
TRANSFORMERS_IS_CI: 1
1514

1615
permissions: {}
1716

@@ -90,6 +89,9 @@ jobs:
9089
# they fail, but add a notice so that the failure is not completely silent
9190
continue-on-error: ${{ matrix.os == 'macos-13' }}
9291
shell: bash
92+
env:
93+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
94+
TRANSFORMERS_IS_CI: 1
9395
run: |
9496
set +e
9597
make test

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,5 +143,7 @@
143143
title: Helpers
144144
- local: package_reference/hotswap
145145
title: Hotswapping adapters
146+
- local: package_reference/functional
147+
title: Functions for PEFT integration
146148
title: Utilities
147149
title: API reference

docs/source/developer_guides/troubleshooting.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,3 +401,67 @@ If it is not possible for you to upgrade PEFT, there is a workaround you can try
401401
Assume the error message says that the unknown keyword argument is named `foobar`. Search inside the `adapter_config.json` of this PEFT adapter for the `foobar` entry and delete it from the file. Then save the file and try loading the model again.
402402

403403
This solution works most of the time. As long as it is the default value for `foobar`, it can be ignored. However, when it is set to some other value, you will get incorrect results. Upgrading PEFT is the recommended solution.
404+
405+
## Adapter handling
406+
407+
### Using multiple adapters at the same time
408+
409+
PEFT allows you to create more than one adapter on the same model. This can be useful in many situations. For example, for inference, you may want to serve two fine-tuned models from the same base model instead of loading the base model once for each fine-tuned model, which would cost more memory. However, multiple adapters can be activated at the same time. This way, the model may leverage the learnings from all those adapters at the same time. As an example, if you have a diffusion model, you may want to use one LoRA adapter to change the style and a different one to change the subject.
410+
411+
Activating multiple adapters at the same time is generally possible on all PEFT methods (LoRA, LoHa, IA³, etc.) except for prompt learning methods (p-tuning, prefix tuning, etc.). The following example illustrates how to achieve this:
412+
413+
```python
414+
from transformers import AutoModelForCausalLM
415+
from peft import PeftModel
416+
417+
model_id = ...
418+
base_model = AutoModelForCausalLM.from_pretrained(model_id)
419+
model = PeftModel.from_pretrained(base_model, lora_path_0) # default adapter_name is 'default'
420+
model.load_adapter(lora_path_1, adapter_name="other")
421+
# the 'other' adapter was loaded but it's not active yet, so to activate both adapters:
422+
model.base_model.set_adapter(["default", "other"])
423+
```
424+
425+
> [!TIP]
426+
> In the example above, you can see that we need to call `model.base_model.set_adapter(["default", "other"])`. Why can we not call `model.set_adapter(["default", "other"])`? This is unfortunately not possible because, as explained earlier, some PEFT methods don't support activating more than one adapter at a time.
427+
428+
It is also possible to train two adapters at the same time, but you should be careful to ensure that the weights of both adapters are known to the optimizer. Otherwise, only one adapter will receive updates.
429+
430+
```python
431+
from transformers import AutoModelForCausalLM
432+
from peft import LoraConfig, get_peft_model
433+
434+
model_id = ...
435+
base_model = AutoModelForCausalLM.from_pretrained(model_id)
436+
lora_config_0 = LoraConfig(...)
437+
lora_config_1 = LoraConfig(...)
438+
model = get_peft_model(base_model, lora_config_0)
439+
model.add_adapter(adapter_name="other", peft_config=lora_config_1)
440+
```
441+
442+
If we would now call:
443+
444+
```python
445+
from transformers import Trainer
446+
447+
trainer = Trainer(model=model, ...)
448+
trainer.train()
449+
```
450+
451+
or
452+
453+
```python
454+
optimizer = torch.optim.AdamW([param for param in model.parameters() if param.requires_grad], ...)
455+
```
456+
457+
then the second LoRA adapter (`"other"`) would not be trained. This is because it is inactive at this moment, which means the `requires_grad` attribute on its parameters is set to `False` and the optimizer will ignore it. Therefore, make sure to activate all adapters that should be trained _before_ initializing the optimizer:
458+
459+
```python
460+
# activate all adapters
461+
model.base_model.set_adapter(["default", "other"])
462+
trainer = Trainer(model=model, ...)
463+
trainer.train()
464+
```
465+
466+
> [!TIP]
467+
> This section deals with using multiple adapters _of the same type_ on the same model, for example, using multiple LoRA adapters at the same time. It does not apply to using _different types_ of adapters on the same model, for example one LoRA adapter and one LoHa adapter. For this, please check [`PeftMixedModel`](https://huggingface.co/docs/peft/developer_guides/mixed_models).
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
2+
rendered properly in your Markdown viewer.
3+
-->
4+
5+
# Functions for PEFT integration
6+
7+
A collection of functions that could be useful for non-PeftModel models, e.g. transformers or diffusers integration
8+
9+
The functions provided here can be considered "public API" of PEFT and hence are safe to be used by packages that provide PEFT integrations.
10+
11+
## Cast the adapter weight dtypes
12+
[[autodoc]] functional.cast_adapter_dtype
13+
- all
14+
15+
## Delete the PEFT adapter from model
16+
[[autodoc]] functional.delete_adapter
17+
- all
18+
19+
## Get the state dict of the PEFT adapter
20+
[[autodoc]] functional.get_peft_model_state_dict
21+
- all
22+
23+
## Inject a PEFT adapter into the model based on a PEFT config
24+
[[autodoc]] functional.inject_adapter_in_model
25+
- all
26+
27+
## Set the active PEFT adapter(s) of the model
28+
[[autodoc]] functional.set_adapter
29+
- all
30+
31+
## Load the weights of the PEFT state dict into the model
32+
[[autodoc]] functional.set_peft_model_state_dict
33+
- all

examples/dora_finetuning/QDoRA_finetuning.ipynb

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"id": "CV_gQs58bsvM"
77
},
88
"source": [
9-
"# Fine-tuning [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) Dataset using QDora (quantized Lora w/ use_dora=True) on T4 Free Colab GPU."
9+
"# Fine-tuning [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) Dataset using QDora (quantized Lora w/ use_dora=True)."
1010
]
1111
},
1212
{
@@ -1010,6 +1010,7 @@
10101010
"top_p = 0.9\n",
10111011
"temperature = 0.7\n",
10121012
"user_question = \"What is the purpose of quantization in LLMs?\"\n",
1013+
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
10131014
"\n",
10141015
"\n",
10151016
"prompt = (\n",
@@ -1021,7 +1022,7 @@
10211022
"\n",
10221023
"\n",
10231024
"def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):\n",
1024-
" inputs = tokenizer(prompt.format(user_question=user_question), return_tensors=\"pt\").to(\"cuda\")\n",
1025+
" inputs = tokenizer(prompt.format(user_question=user_question), return_tensors=\"pt\").to(device)\n",
10251026
"\n",
10261027
" outputs = model.generate(\n",
10271028
" **inputs,\n",

examples/dora_finetuning/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ from peft import LoraConfig, get_peft_model
1313
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
1414
from datasets import load_dataset
1515

16-
model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="cuda")
16+
model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="auto")
1717
tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
1818
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
1919
lora_config = LoraConfig(
@@ -70,7 +70,6 @@ python dora_finetuning.py \
7070
--quantize \
7171
--eval_step 10 \
7272
--save_step 100 \
73-
--device "cuda:0" \
7473
--lora_r 16 \
7574
--lora_alpha 32 \
7675
--lora_dropout 0.05 \

examples/dora_finetuning/dora_finetuning.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,22 +39,27 @@ def train_model(
3939
hf_token = os.getenv("HF_TOKEN")
4040

4141
# Setup device
42-
device = torch.device(device)
42+
if device == "auto":
43+
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
44+
else:
45+
device = torch.device(device)
4346
print(f"Using device: {device}")
4447

4548
# load tokenizer
4649
tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
4750

4851
# QDoRA (quantized dora): IF YOU WANNA QUANTIZE THE MODEL
4952
if quantize:
53+
if (torch.cuda.is_available() and torch.cuda.is_bf16_supported()) or torch.xpu.is_available():
54+
bnb_4bit_compute_dtype = torch.bfloat16
55+
else:
56+
bnb_4bit_compute_dtype = torch.float16
5057
model = AutoModelForCausalLM.from_pretrained(
5158
base_model,
5259
token=hf_token,
5360
quantization_config=BitsAndBytesConfig(
5461
load_in_4bit=True,
55-
bnb_4bit_compute_dtype=(
56-
torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
57-
),
62+
bnb_4bit_compute_dtype=bnb_4bit_compute_dtype,
5863
bnb_4bit_use_double_quant=True,
5964
bnb_4bit_quant_type="nf4",
6065
),
@@ -117,8 +122,11 @@ def tokenize_function(examples):
117122
hub_token=hf_token,
118123
)
119124

120-
# Clear CUDA cache to free memory
121-
torch.cuda.empty_cache()
125+
# Clear device cache to free memory
126+
if torch.cuda.is_available():
127+
torch.cuda.empty_cache()
128+
elif torch.xpu.is_available():
129+
torch.xpu.empty_cache()
122130

123131
# Initialize the Trainer
124132
trainer = Trainer(
@@ -162,7 +170,7 @@ def tokenize_function(examples):
162170
parser.add_argument("--quantize", action="store_true", help="Use quantization")
163171
parser.add_argument("--eval_step", type=int, default=10, help="Evaluation step interval")
164172
parser.add_argument("--save_step", type=int, default=100, help="Save step interval")
165-
parser.add_argument("--device", type=str, default="cuda:0", help="Device to use for training")
173+
parser.add_argument("--device", type=str, default="auto", help="Device to use for training")
166174
parser.add_argument("--lora_r", type=int, default=8, help="LoRA rank")
167175
parser.add_argument("--lora_alpha", type=int, default=16, help="LoRA alpha")
168176
parser.add_argument("--lora_dropout", type=float, default=0.05, help="LoRA dropout rate")

0 commit comments

Comments
 (0)