huggingface
diff --git a/‎.github/workflows/tests-main.yml‎
Lines changed: 3 additions & 3 deletions b/‎.github/workflows/tests-main.yml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎.github/workflows/tests.yml‎
Lines changed: 3 additions & 1 deletion b/‎.github/workflows/tests.yml‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/developer_guides/troubleshooting.md‎
Lines changed: 64 additions & 0 deletions b/‎docs/source/developer_guides/troubleshooting.md‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎docs/source/package_reference/functional.md‎
Lines changed: 33 additions & 0 deletions b/‎docs/source/package_reference/functional.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎examples/dora_finetuning/QDoRA_finetuning.ipynb‎
Lines changed: 3 additions & 2 deletions b/‎examples/dora_finetuning/QDoRA_finetuning.ipynb‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎examples/dora_finetuning/README.md‎
Lines changed: 1 addition & 2 deletions b/‎examples/dora_finetuning/README.md‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎examples/dora_finetuning/dora_finetuning.py‎
Lines changed: 15 additions & 7 deletions b/‎examples/dora_finetuning/dora_finetuning.py‎
Lines changed: 15 additions & 7 deletions
@@ -6,9 +6,6 @@ on:
     paths-ignore:
         - 'docs/**'
 
-env:
-  TRANSFORMERS_IS_CI: 1
-
 permissions: {}
 
 jobs:
@@ -31,6 +28,9 @@ jobs:
           pip install -U git+https://github.com/huggingface/transformers.git
           pip install -e .[test]
       - name: Test with pytest
+        env:
+          TRANSFORMERS_IS_CI: 1
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
           make test
       - name: Post to Slack
 
@@ -11,7 +11,6 @@ on:
 
 env:
   HF_HOME: .cache/huggingface
-  TRANSFORMERS_IS_CI: 1
 
 permissions: {}
 
@@ -90,6 +89,9 @@ jobs:
         # they fail, but add a notice so that the failure is not completely silent
         continue-on-error: ${{ matrix.os == 'macos-13' }}
         shell: bash
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          TRANSFORMERS_IS_CI: 1
         run: |
           set +e
           make test
 
@@ -143,5 +143,7 @@
       title: Helpers
     - local: package_reference/hotswap
       title: Hotswapping adapters
+    - local: package_reference/functional
+      title: Functions for PEFT integration
     title: Utilities
   title: API reference
@@ -401,3 +401,67 @@ If it is not possible for you to upgrade PEFT, there is a workaround you can try
 Assume the error message says that the unknown keyword argument is named `foobar`. Search inside the `adapter_config.json` of this PEFT adapter for the `foobar` entry and delete it from the file. Then save the file and try loading the model again.
 
 This solution works most of the time. As long as it is the default value for `foobar`, it can be ignored. However, when it is set to some other value, you will get incorrect results. Upgrading PEFT is the recommended solution.
+
+## Adapter handling
+
+### Using multiple adapters at the same time
+
+PEFT allows you to create more than one adapter on the same model. This can be useful in many situations. For example, for inference, you may want to serve two fine-tuned models from the same base model instead of loading the base model once for each fine-tuned model, which would cost more memory. However, multiple adapters can be activated at the same time. This way, the model may leverage the learnings from all those adapters at the same time. As an example, if you have a diffusion model, you may want to use one LoRA adapter to change the style and a different one to change the subject.
+
+Activating multiple adapters at the same time is generally possible on all PEFT methods (LoRA, LoHa, IA³, etc.) except for prompt learning methods (p-tuning, prefix tuning, etc.). The following example illustrates how to achieve this:
+
+```python
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+
+model_id = ...
+base_model = AutoModelForCausalLM.from_pretrained(model_id)
+model = PeftModel.from_pretrained(base_model, lora_path_0)  # default adapter_name is 'default'
+model.load_adapter(lora_path_1, adapter_name="other")
+# the 'other' adapter was loaded but it's not active yet, so to activate both adapters:
+model.base_model.set_adapter(["default", "other"])
+```
+
+> [!TIP]
+> In the example above, you can see that we need to call `model.base_model.set_adapter(["default", "other"])`. Why can we not call `model.set_adapter(["default", "other"])`? This is unfortunately not possible because, as explained earlier, some PEFT methods don't support activating more than one adapter at a time.
+
+It is also possible to train two adapters at the same time, but you should be careful to ensure that the weights of both adapters are known to the optimizer. Otherwise, only one adapter will receive updates.
+
+```python
+from transformers import AutoModelForCausalLM
+from peft import LoraConfig, get_peft_model
+
+model_id = ...
+base_model = AutoModelForCausalLM.from_pretrained(model_id)
+lora_config_0 = LoraConfig(...)
+lora_config_1 = LoraConfig(...)
+model = get_peft_model(base_model, lora_config_0)
+model.add_adapter(adapter_name="other", peft_config=lora_config_1)
+```
+
+If we would now call:
+
+```python
+from transformers import Trainer
+
+trainer = Trainer(model=model,  ...)
+trainer.train()
+```
+
+or
+
+```python
+optimizer = torch.optim.AdamW([param for param in model.parameters() if param.requires_grad], ...)
+```
+
+then the second LoRA adapter (`"other"`) would not be trained. This is because it is inactive at this moment, which means the `requires_grad` attribute on its parameters is set to `False` and the optimizer will ignore it. Therefore, make sure to activate all adapters that should be trained _before_ initializing the optimizer:
+
+```python
+# activate all adapters
+model.base_model.set_adapter(["default", "other"])
+trainer = Trainer(model=model,  ...)
+trainer.train()
+```
+
+> [!TIP]
+> This section deals with using multiple adapters _of the same type_ on the same model, for example, using multiple LoRA adapters at the same time. It does not apply to using _different types_ of adapters on the same model, for example one LoRA adapter and one LoHa adapter. For this, please check [`PeftMixedModel`](https://huggingface.co/docs/peft/developer_guides/mixed_models).
@@ -0,0 +1,33 @@
+<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+-->
+
+# Functions for PEFT integration
+
+A collection of functions that could be useful for non-PeftModel models, e.g. transformers or diffusers integration
+
+The functions provided here can be considered "public API" of PEFT and hence are safe to be used by packages that provide PEFT integrations.
+
+## Cast the adapter weight dtypes
+[[autodoc]] functional.cast_adapter_dtype
+    - all
+
+## Delete the PEFT adapter from model
+[[autodoc]] functional.delete_adapter
+    - all
+
+## Get the state dict of the PEFT adapter
+[[autodoc]] functional.get_peft_model_state_dict
+    - all
+
+## Inject a PEFT adapter into the model based on a PEFT config
+[[autodoc]] functional.inject_adapter_in_model
+    - all
+
+## Set the active PEFT adapter(s) of the model
+[[autodoc]] functional.set_adapter
+    - all
+
+## Load the weights of the PEFT state dict into the model
+[[autodoc]] functional.set_peft_model_state_dict
+    - all
@@ -6,7 +6,7 @@
     "id": "CV_gQs58bsvM"
    },
    "source": [
-    "# Fine-tuning [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) Dataset using QDora (quantized Lora w/ use_dora=True) on T4 Free Colab GPU."
+    "# Fine-tuning [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) Dataset using QDora (quantized Lora w/ use_dora=True)."
    ]
   },
   {
@@ -1010,6 +1010,7 @@
     "top_p = 0.9\n",
     "temperature = 0.7\n",
     "user_question = \"What is the purpose of quantization in LLMs?\"\n",
+    "device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
     "\n",
     "\n",
     "prompt = (\n",
@@ -1021,7 +1022,7 @@
     "\n",
     "\n",
     "def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):\n",
-    "    inputs = tokenizer(prompt.format(user_question=user_question), return_tensors=\"pt\").to(\"cuda\")\n",
+    "    inputs = tokenizer(prompt.format(user_question=user_question), return_tensors=\"pt\").to(device)\n",
     "\n",
     "    outputs = model.generate(\n",
     "        **inputs,\n",
 
@@ -13,7 +13,7 @@ from peft import LoraConfig, get_peft_model
 from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer
 from datasets import load_dataset
 
-model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="cuda")
+model = AutoModelForCausalLM.from_pretrained("huggyllama/llama-7b", device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
 dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
 lora_config = LoraConfig(
@@ -70,7 +70,6 @@ python dora_finetuning.py \
     --quantize \
     --eval_step 10 \
     --save_step 100 \
-    --device "cuda:0" \
     --lora_r 16 \
     --lora_alpha 32 \
     --lora_dropout 0.05 \
 
@@ -39,22 +39,27 @@ def train_model(
     hf_token = os.getenv("HF_TOKEN")
 
     # Setup device
-    device = torch.device(device)
+    if device == "auto":
+        device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
+    else:
+        device = torch.device(device)
     print(f"Using device: {device}")
 
     # load tokenizer
     tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
 
     # QDoRA (quantized dora): IF YOU WANNA QUANTIZE THE MODEL
     if quantize:
+        if (torch.cuda.is_available() and torch.cuda.is_bf16_supported()) or torch.xpu.is_available():
+            bnb_4bit_compute_dtype = torch.bfloat16
+        else:
+            bnb_4bit_compute_dtype = torch.float16
         model = AutoModelForCausalLM.from_pretrained(
             base_model,
             token=hf_token,
             quantization_config=BitsAndBytesConfig(
                 load_in_4bit=True,
-                bnb_4bit_compute_dtype=(
-                    torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
-                ),
+                bnb_4bit_compute_dtype=bnb_4bit_compute_dtype,
                 bnb_4bit_use_double_quant=True,
                 bnb_4bit_quant_type="nf4",
             ),
@@ -117,8 +122,11 @@ def tokenize_function(examples):
         hub_token=hf_token,
     )
 
-    # Clear CUDA cache to free memory
-    torch.cuda.empty_cache()
+    # Clear device cache to free memory
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    elif torch.xpu.is_available():
+        torch.xpu.empty_cache()
 
     # Initialize the Trainer
     trainer = Trainer(
@@ -162,7 +170,7 @@ def tokenize_function(examples):
     parser.add_argument("--quantize", action="store_true", help="Use quantization")
     parser.add_argument("--eval_step", type=int, default=10, help="Evaluation step interval")
     parser.add_argument("--save_step", type=int, default=100, help="Save step interval")
-    parser.add_argument("--device", type=str, default="cuda:0", help="Device to use for training")
+    parser.add_argument("--device", type=str, default="auto", help="Device to use for training")
     parser.add_argument("--lora_r", type=int, default=8, help="LoRA rank")
     parser.add_argument("--lora_alpha", type=int, default=16, help="LoRA alpha")
     parser.add_argument("--lora_dropout", type=float, default=0.05, help="LoRA dropout rate")