Skip to content

Commit

Permalink
Merge branch 'main' into raw-pytorch
Browse files Browse the repository at this point in the history
  • Loading branch information
cg123 authored Oct 26, 2024
2 parents e2628f1 + 93ace70 commit e69677a
Show file tree
Hide file tree
Showing 35 changed files with 1,982 additions and 220 deletions.
47 changes: 43 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ Features:
- Interpolated gradients for parameter values (inspired by Gryphe's [BlockMerge_Gradient](https://github.com/Gryphe/BlockMerge_Gradient) script)
- Piecewise assembly of language models from layers ("Frankenmerging")
- [Mixture of Experts merging](#mixture-of-experts-merging)
- [LORA extraction](#lora-extraction)
- [Evolutionary merge methods](#evolutionary-merge-methods)

🔊 Call to Evolve - to solve evolutionary merge methods as a community - please see <https://github.com/arcee-ai/mergekit/issues/207>.

🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a graphical user interface for mergekit in Hugging Face Spaces! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at [Hugging Face Spaces - mergekit-community](https://huggingface.co/mergekit-community).
🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a mega-GPU backed graphical user interface for mergekit in Arcee! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at the [Arcee App](https://app.arcee.ai). There is also a [Hugging Face Space](https://huggingface.co/mergekit-community) with limited amounts of GPUs.

## Installation

Expand Down Expand Up @@ -128,7 +128,8 @@ A quick overview of the currently supported merge methods:
| [Model Breadcrumbs](https://arxiv.org/abs/2312.06795) | `breadcrumbs` |||
| [Model Breadcrumbs](https://arxiv.org/abs/2312.06795) + [TIES](https://arxiv.org/abs/2306.01708) | `breadcrumbs_ties` |||
| [Model Stock](https://arxiv.org/abs/2403.19522) | `model_stock` |||

| [DELLA](https://arxiv.org/abs/2406.11617) | `della` |||
| [DELLA](https://arxiv.org/abs/2406.11617) [Task Arithmetic](https://arxiv.org/abs/2212.04089) | `della_linear` |||
### Linear

The classic merge method - a simple weighted average.
Expand Down Expand Up @@ -189,6 +190,15 @@ Parameters:

- `filter_wise`: if true, weight calculation will be per-row rather than per-tensor. Not recommended.

### [DELLA](https://arxiv.org/abs/2406.11617)

Building upon DARE, DELLA uses adaptive pruning based on parameter magnitudes. DELLA first ranks parameters in each row of delta parameters and assigns drop probabilities inversely proportional to their magnitudes. This allows it to retain more important changes while reducing interference. After pruning, it rescales the remaining parameters similar to [DARE](#dare). DELLA can be used with (`della`) or without (`della_linear`) the sign elect step of TIES

Parameters: same as [Linear](#linear), plus:
- `density` - fraction of weights in differences from the base model to retain
- `epsilon` - maximum change in drop probability based on magnitude. Drop probabilities assigned will range from `density - epsilon` to `density + epsilon`. (When selecting values for `density` and `epsilon`, ensure that the range of probabilities falls within 0 to 1)
- `lambda` - scaling factor for the final merged delta parameters before merging with the base parameters.

## LoRA extraction

Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.
Expand All @@ -203,6 +213,35 @@ mergekit-extract-lora finetuned_model_id_or_path base_model_id_or_path output_pa

The `mergekit-moe` script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see the [`mergekit-moe` documentation](docs/moe.md).

## Evolutionary merge methods

See `docs/evolve.md` for details.

## ✨ Merge in the Cloud ✨

We host merging on Arcee's cloud GPUs - you can launch a cloud merge in the [Arcee App](https://app.arcee.ai). Or through python - grab an ARCEE_API_KEY:

`export ARCEE_API_KEY=<your-api-key>`
`pip install -q arcee-py`

```
import arcee
arcee.merge_yaml("bio-merge","./examples/bio-merge.yml")
```

Check your merge status at the [Arcee App](https://app.arcee.ai)

When complete, either deploy your merge:

```
arcee.start_deployment("bio-merge", merging="bio-merge")
```

Or download your merge:

`!arcee merging download bio-merge`


## Citation

We now have a [paper](https://arxiv.org/abs/2403.13257) you can cite for the MergeKit library:
Expand Down
15 changes: 15 additions & 0 deletions examples/bio-merge.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
models:
- model: mistralai/Mistral-7B-Instruct-v0.2
parameters:
density: 0.5
weight: 0.5
- model: BioMistral/BioMistral-7B
parameters:
density: 0.5
weight: 0.5
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
normalize: false
int8_mask: true
dtype: float16
4 changes: 1 addition & 3 deletions mergekit/_data/architectures/cohere.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,7 @@
{
"name": "lm_head.weight",
"is_embed": true,
"aliases": [
"model.embed_tokens.weight"
]
"optional": true
}
],
"num_layers_config_key": "num_hidden_layers",
Expand Down
78 changes: 78 additions & 0 deletions mergekit/_data/architectures/exaone.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"model_type": "exaone",
"architectures": [
"ExaoneForCausalLM"
],
"pre_weights": [
{
"name": "transformer.wte.weight",
"is_embed": true,
"output_space": "running_residual"
}
],
"num_layers_config_key": "num_hidden_layers",
"layer_templates": {
"weights": [
{
"name": "transformer.h.${layer_index}.ln_1.weight",
"input_space": "running_residual"
},
{
"name": "transformer.h.${layer_index}.attn.attention.q_proj.weight",
"input_space": "running_residual",
"output_space": "attn_qk_${layer_index}",
"head_split": "output",
"is_kq": true
},
{
"name": "transformer.h.${layer_index}.attn.attention.k_proj.weight",
"input_space": "running_residual",
"output_space": "attn_qk_${layer_index}",
"head_split": "output",
"is_kq": true
},
{
"name": "transformer.h.${layer_index}.attn.attention.v_proj.weight",
"input_space": "running_residual",
"output_space": "attn_v_${layer_index}",
"head_split": "output"
},
{
"name": "transformer.h.${layer_index}.attn.attention.out_proj.weight",
"input_space": "attn_v_${layer_index}",
"output_space": "running_residual",
"head_split": "input"
},
{
"name": "transformer.h.${layer_index}.ln_2.weight",
"input_space": "running_residual"
},
{
"name": "transformer.h.${layer_index}.mlp.c_fc_0.weight",
"input_space": "running_residual",
"output_space": "up_${layer_index}"
},
{
"name": "transformer.h.${layer_index}.mlp.c_fc_1.weight",
"input_space": "running_residual",
"output_space": "up_${layer_index}"
},
{
"name": "transformer.h.${layer_index}.mlp.c_proj.weight",
"input_space": "up_${layer_index}",
"output_space": "running_residual"
}
]
},
"post_weights": [
{
"name": "transformer.ln_f.weight",
"input_space": "running_residual"
},
{
"name": "lm_head.weight",
"input_space": "running_residual",
"is_embed": true
}
]
}
60 changes: 60 additions & 0 deletions mergekit/_data/architectures/gemma2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
{
"model_type": "gemma2",
"architectures": [
"Gemma2ForCausalLM"
],
"pre_weights": [
{
"name": "model.embed_tokens.weight",
"is_embed": true
}
],
"num_layers_config_key": "num_hidden_layers",
"layer_templates": {
"weights": [
{
"name": "model.layers.${layer_index}.input_layernorm.weight"
},
{
"name": "model.layers.${layer_index}.self_attn.q_proj.weight"
},
{
"name": "model.layers.${layer_index}.self_attn.k_proj.weight"
},
{
"name": "model.layers.${layer_index}.self_attn.v_proj.weight"
},
{
"name": "model.layers.${layer_index}.self_attn.o_proj.weight"
},
{
"name": "model.layers.${layer_index}.post_attention_layernorm.weight"
},
{
"name": "model.layers.${layer_index}.pre_feedforward_layernorm.weight"
},
{
"name": "model.layers.${layer_index}.mlp.up_proj.weight"
},
{
"name": "model.layers.${layer_index}.mlp.gate_proj.weight"
},
{
"name": "model.layers.${layer_index}.mlp.down_proj.weight"
},
{
"name": "model.layers.${layer_index}.post_feedforward_layernorm.weight"
}
]
},
"post_weights": [
{
"name": "model.norm.weight"
},
{
"name": "lm_head.weight",
"is_embed": true,
"optional": true
}
]
}
50 changes: 50 additions & 0 deletions mergekit/_data/architectures/internlm2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"model_type": "internlm2",
"architectures": [
"InternLM2ForCausalLM"
],
"pre_weights": [
{
"name": "model.tok_embeddings.weight",
"is_embed": true
}
],
"post_weights": [
{
"name": "model.norm.weight"
},
{
"name": "output.weight",
"is_embed": true,
"aliases": [
"model.tok_embeddings.weight"
]
}
],
"num_layers_config_key": "num_hidden_layers",
"layer_templates": {
"weights": [
{
"name": "model.layers.${layer_index}.attention_norm.weight"
},
{
"name": "model.layers.${layer_index}.ffn_norm.weight"
},
{
"name": "model.layers.${layer_index}.attention.wqkv.weight"
},
{
"name": "model.layers.${layer_index}.attention.wo.weight"
},
{
"name": "model.layers.${layer_index}.feed_forward.w1.weight"
},
{
"name": "model.layers.${layer_index}.feed_forward.w2.weight"
},
{
"name": "model.layers.${layer_index}.feed_forward.w3.weight"
}
]
}
}
Loading

0 comments on commit e69677a

Please sign in to comment.