Merge branch 'main' into raw-pytorch

arcee-ai · Oct 26, 2024 · e69677a · e69677a
2 parents e2628f1 + 93ace70
commit e69677a
Show file tree

Hide file tree

Showing 35 changed files with 1,982 additions and 220 deletions.
diff --git a/README.md b/README.md
@@ -11,10 +11,10 @@ Features:
 - Interpolated gradients for parameter values (inspired by Gryphe's [BlockMerge_Gradient](https://github.com/Gryphe/BlockMerge_Gradient) script)
 - Piecewise assembly of language models from layers ("Frankenmerging")
 - [Mixture of Experts merging](#mixture-of-experts-merging)
+- [LORA extraction](#lora-extraction)
+- [Evolutionary merge methods](#evolutionary-merge-methods)
 
-🔊 Call to Evolve - to solve evolutionary merge methods as a community - please see <https://github.com/arcee-ai/mergekit/issues/207>.
-
-🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a graphical user interface for mergekit in Hugging Face Spaces! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at [Hugging Face Spaces - mergekit-community](https://huggingface.co/mergekit-community).
+🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a mega-GPU backed graphical user interface for mergekit in Arcee! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at the [Arcee App](https://app.arcee.ai). There is also a [Hugging Face Space](https://huggingface.co/mergekit-community) with limited amounts of GPUs.
 
 ## Installation
 
@@ -128,7 +128,8 @@ A quick overview of the currently supported merge methods:
 | [Model Breadcrumbs](https://arxiv.org/abs/2312.06795)                                            | `breadcrumbs`        | ✅          | ✅              |
 | [Model Breadcrumbs](https://arxiv.org/abs/2312.06795) + [TIES](https://arxiv.org/abs/2306.01708) | `breadcrumbs_ties`   | ✅          | ✅              |
 | [Model Stock](https://arxiv.org/abs/2403.19522)                                                  | `model_stock`        | ✅          | ✅              |
-
+| [DELLA](https://arxiv.org/abs/2406.11617)                                                  | `della`        | ✅          | ✅              |
+| [DELLA](https://arxiv.org/abs/2406.11617) [Task Arithmetic](https://arxiv.org/abs/2212.04089)                                                  | `della_linear`        | ✅          | ✅              |
 ### Linear
 
 The classic merge method - a simple weighted average.
@@ -189,6 +190,15 @@ Parameters:
 
 - `filter_wise`: if true, weight calculation will be per-row rather than per-tensor. Not recommended.
 
+### [DELLA](https://arxiv.org/abs/2406.11617)
+
+Building upon DARE, DELLA uses adaptive pruning based on parameter magnitudes. DELLA first ranks parameters in each row of delta parameters and assigns drop probabilities inversely proportional to their magnitudes. This allows it to retain more important changes while reducing interference. After pruning, it rescales the remaining parameters similar to [DARE](#dare). DELLA can be used with (`della`) or without (`della_linear`) the sign elect step of TIES
+
+Parameters: same as [Linear](#linear), plus:
+- `density` - fraction of weights in differences from the base model to retain
+- `epsilon` - maximum change in drop probability based on magnitude. Drop probabilities assigned will range from `density - epsilon` to `density + epsilon`. (When selecting values for `density` and `epsilon`, ensure that the range of probabilities falls within 0 to 1)
+- `lambda` - scaling factor for the final merged delta parameters before merging with the base parameters.
+
 ## LoRA extraction
 
 Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.
@@ -203,6 +213,35 @@ mergekit-extract-lora finetuned_model_id_or_path base_model_id_or_path output_pa
 
 The `mergekit-moe` script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see the [`mergekit-moe` documentation](docs/moe.md).
 
+## Evolutionary merge methods
+
+See `docs/evolve.md` for details.
+
+## ✨ Merge in the Cloud ✨
+
+We host merging on Arcee's cloud GPUs - you can launch a cloud merge in the [Arcee App](https://app.arcee.ai). Or through python - grab an ARCEE_API_KEY:
+
+`export ARCEE_API_KEY=<your-api-key>`
+`pip install -q arcee-py`
+
+```
+import arcee
+arcee.merge_yaml("bio-merge","./examples/bio-merge.yml")
+```
+
+Check your merge status at the [Arcee App](https://app.arcee.ai)
+
+When complete, either deploy your merge:
+
+```
+arcee.start_deployment("bio-merge", merging="bio-merge")
+```
+
+Or download your merge:
+
+`!arcee merging download bio-merge`
+
+
 ## Citation
 
 We now have a [paper](https://arxiv.org/abs/2403.13257) you can cite for the MergeKit library:

diff --git a/examples/bio-merge.yml b/examples/bio-merge.yml
@@ -0,0 +1,15 @@
+models:
+  - model: mistralai/Mistral-7B-Instruct-v0.2
+    parameters:
+      density: 0.5
+      weight: 0.5
+  - model: BioMistral/BioMistral-7B
+    parameters:
+      density: 0.5
+      weight: 0.5
+merge_method: ties
+base_model: mistralai/Mistral-7B-v0.1
+parameters:
+  normalize: false
+  int8_mask: true
+dtype: float16
diff --git a/mergekit/_data/architectures/cohere.json b/mergekit/_data/architectures/cohere.json
@@ -16,9 +16,7 @@
         {
             "name": "lm_head.weight",
             "is_embed": true,
-            "aliases": [
-                "model.embed_tokens.weight"
-            ]
+            "optional": true
         }
     ],
     "num_layers_config_key": "num_hidden_layers",

diff --git a/mergekit/_data/architectures/exaone.json b/mergekit/_data/architectures/exaone.json
@@ -0,0 +1,78 @@
+{
+    "model_type": "exaone",
+    "architectures": [
+        "ExaoneForCausalLM"
+    ],
+    "pre_weights": [
+        {
+            "name": "transformer.wte.weight",
+            "is_embed": true,
+            "output_space": "running_residual"
+        }
+    ],
+    "num_layers_config_key": "num_hidden_layers",
+    "layer_templates": {
+        "weights": [
+            {
+                "name": "transformer.h.${layer_index}.ln_1.weight",
+                "input_space": "running_residual"
+            },
+            {
+                "name": "transformer.h.${layer_index}.attn.attention.q_proj.weight",
+                "input_space": "running_residual",
+                "output_space": "attn_qk_${layer_index}",
+                "head_split": "output",
+                "is_kq": true
+            },
+            {
+                "name": "transformer.h.${layer_index}.attn.attention.k_proj.weight",
+                "input_space": "running_residual",
+                "output_space": "attn_qk_${layer_index}",
+                "head_split": "output",
+                "is_kq": true
+            },
+            {
+                "name": "transformer.h.${layer_index}.attn.attention.v_proj.weight",
+                "input_space": "running_residual",
+                "output_space": "attn_v_${layer_index}",
+                "head_split": "output"
+            },
+            {
+                "name": "transformer.h.${layer_index}.attn.attention.out_proj.weight",
+                "input_space": "attn_v_${layer_index}",
+                "output_space": "running_residual",
+                "head_split": "input"
+            },
+            {
+                "name": "transformer.h.${layer_index}.ln_2.weight",
+                "input_space": "running_residual"
+            },
+            {
+                "name": "transformer.h.${layer_index}.mlp.c_fc_0.weight",
+                "input_space": "running_residual",
+                "output_space": "up_${layer_index}"
+            },
+            {
+                "name": "transformer.h.${layer_index}.mlp.c_fc_1.weight",
+                "input_space": "running_residual",
+                "output_space": "up_${layer_index}"
+            },
+            {
+                "name": "transformer.h.${layer_index}.mlp.c_proj.weight",
+                "input_space": "up_${layer_index}",
+                "output_space": "running_residual"
+            }
+        ]
+    },
+    "post_weights": [
+        {
+            "name": "transformer.ln_f.weight",
+            "input_space": "running_residual"
+        },
+        {
+            "name": "lm_head.weight",
+            "input_space": "running_residual",
+            "is_embed": true
+        }
+    ]
+}
diff --git a/mergekit/_data/architectures/gemma2.json b/mergekit/_data/architectures/gemma2.json
@@ -0,0 +1,60 @@
+{
+    "model_type": "gemma2",
+    "architectures": [
+        "Gemma2ForCausalLM"
+    ],
+    "pre_weights": [
+        {
+            "name": "model.embed_tokens.weight",
+            "is_embed": true
+        }
+    ],
+    "num_layers_config_key": "num_hidden_layers",
+    "layer_templates": {
+        "weights": [
+            {
+                "name": "model.layers.${layer_index}.input_layernorm.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.self_attn.q_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.self_attn.k_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.self_attn.v_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.self_attn.o_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.post_attention_layernorm.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.pre_feedforward_layernorm.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.mlp.up_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.mlp.gate_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.mlp.down_proj.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.post_feedforward_layernorm.weight"
+            }
+        ]
+    },
+    "post_weights": [
+        {
+            "name": "model.norm.weight"
+        },
+        {
+            "name": "lm_head.weight",
+            "is_embed": true,
+            "optional": true
+        }
+    ]
+}
diff --git a/mergekit/_data/architectures/internlm2.json b/mergekit/_data/architectures/internlm2.json
@@ -0,0 +1,50 @@
+{
+    "model_type": "internlm2",
+    "architectures": [
+        "InternLM2ForCausalLM"
+    ],
+    "pre_weights": [
+        {
+            "name": "model.tok_embeddings.weight",
+            "is_embed": true
+        }
+    ],
+    "post_weights": [
+        {
+            "name": "model.norm.weight"
+        },
+        {
+            "name": "output.weight",
+            "is_embed": true,
+            "aliases": [
+                "model.tok_embeddings.weight"
+            ]
+        }
+    ],
+    "num_layers_config_key": "num_hidden_layers",
+    "layer_templates": {
+        "weights": [
+            {
+                "name": "model.layers.${layer_index}.attention_norm.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.ffn_norm.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.attention.wqkv.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.attention.wo.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.feed_forward.w1.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.feed_forward.w2.weight"
+            },
+            {
+                "name": "model.layers.${layer_index}.feed_forward.w3.weight"
+            }
+        ]
+    }
+}