|
1 |
| -# Evolutionary Model Merge |
| 1 | +# 🐟 Evolutionary Optimization of Model Merging Recipes |
2 | 2 |
|
3 |
| -This is an official repository of [Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/TODO) to reproduce the results. |
| 3 | +🤗 [Models](https://huggingface.co/SakanaAI) | 👀 [Demo](TODO) | 📚 [Paper](TODO) | 📝 [Blog](TODO) | 🐦 [Twitter](https://twitter.com/SakanaAILabs) |
4 | 4 |
|
5 |
| -## Model Zoo |
| 5 | +This repository serves as a central hub for SakanaAI's [Evolutionary Model Merge](TODO) series, showcasing its releases and resources. It includes models and code for reproducing the evaluation presented in our paper. Look forward to more updates and additions coming soon. |
6 | 6 |
|
7 |
| -### LLM |
8 | 7 |
|
9 |
| -| Id. | Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (Average ↑) | |
10 |
| -| :--: | :-- | --: | --: | |
11 |
| -| 1 | [Shisa Gamma 7B v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 | |
12 |
| -| 2 | [WizardMath 7B V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 | |
13 |
| -| 3 | [Abel 7B 002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 | |
14 |
| -| 4 | [Arithmo2 Mistral 7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) | 24.0 | 56.4 | |
15 |
| -| 5 | [(Ours) EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | **52.4** | **69.0** | |
16 |
| -| 6 | [(Ours) EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | **52.0** | **70.5** | |
17 |
| -| 7 | [(Ours) EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | **55.6** | **68.2** | |
| 8 | +## Models |
| 9 | + |
| 10 | +### Our Models |
| 11 | + |
| 12 | +| Model | Size | License | Source | |
| 13 | +| :-- | --: | :-- | :-- | |
| 14 | +| [EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | 7B | Microsoft Research License | [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1), [WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1), [GAIR/Abel-7B-002](https://huggingface.co/GAIR/Abel-7B-002) |
| 15 | +| [EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | 10B | Microsoft Research License | EvoLLM-JP-v1-7B, [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | |
| 16 | +| [EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | 7B | Apache 2.0 | [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1), [Arithmo2-Mistral-7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B), [GAIR/Abel-7B-002](https://huggingface.co/GAIR/Abel-7B-002) | |
| 17 | +| [EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B) | 7B | Apache 2.0 | [LLaVA-1.6-Mistral-7B](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b), [shisa-gamma-7b-v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +### Comparing EvoLLM-JP w/ Source LLMs |
| 23 | + |
| 24 | +For details on the evaluation, please refer to Section 4.1 of the paper. |
| 25 | + |
| 26 | +| Model | MGSM-JA (acc ↑) | [lm-eval-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable) (avg ↑) | |
| 27 | +| :-- | --: | --: | |
| 28 | +| [Shisa Gamma 7B v1](https://huggingface.co/augmxnt/shisa-gamma-7b-v1) | 9.6 | 66.1 | |
| 29 | +| [WizardMath 7B V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) | 18.4 | 60.1 | |
| 30 | +| [Abel 7B 002](https://huggingface.co/GAIR/Abel-7B-002) | 30.0 | 56.5 | |
| 31 | +| [Arithmo2 Mistral 7B](https://huggingface.co/upaya07/Arithmo2-Mistral-7B) | 24.0 | 56.4 | |
| 32 | +| [EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) | **52.4** | **69.0** | |
| 33 | +| [EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) | **52.0** | **70.5** | |
| 34 | +| [EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) | **55.6** | **68.2** | |
| 35 | + |
| 36 | + |
| 37 | +### Comparing EvoVLM-JP w/ Existing VLMs |
| 38 | + |
| 39 | +For details on the evaluation, please see Section 4.2 of the paper. |
18 | 40 |
|
19 |
| -### VLM |
20 | 41 |
|
21 | 42 | | Model | JA-VG-VQA-500 (ROUGE-L ↑) | JA-VLM-Bench-In-the-Wild (ROUGE-L ↑) |
|
22 | 43 | | :-- | --: | --: |
|
23 | 44 | | [LLaVA-1.6-Mistral-7B](https://llava-vl.github.io/blog/2024-01-30-llava-next/) | 14.32 | 41.10 |
|
24 |
| -| [Japanese Stable VLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | - | 40.50 | |
25 |
| -| [Heron BLIP Japanese StableLM Base 7B llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k)\* | 8.73 | 27.37 | |
| 45 | +| [Japanese Stable VLM](https://huggingface.co/stabilityai/japanese-stable-vlm) | -<sup>*1</sup> | 40.50 | |
| 46 | +| [Heron BLIP Japanese StableLM Base 7B llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k) | 8.73<sup>*2</sup> | 27.37<sup>*2</sup> | |
26 | 47 | | [(Ours) EvoVLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B) | **19.70** | **51.25** |
|
27 | 48 |
|
28 |
| -* \* We are checking with the authors to see if this current results are valid. |
| 49 | +* \*1: Japanese Stable VLM cannot be evaluated using the VA-VG-VQA-500 dataset because this model has used this dataset for training. |
| 50 | +* \*2: We are checking with the authors to see if this current results are valid. |
| 51 | + |
29 | 52 |
|
30 |
| -## Installation |
31 | 53 |
|
32 |
| -### 1. Clone the repo |
| 54 | +## Reproducing the Evaluation |
| 55 | + |
| 56 | +### 1. Clone the Repo |
33 | 57 |
|
34 | 58 | ```bash
|
35 | 59 | git clone https://github.com/SakanaAI/evolving-merged-models.git
|
36 | 60 | cd evolving-merged-models
|
37 | 61 | ```
|
38 | 62 |
|
39 |
| -### 2. Download fastext |
| 63 | +### 2. Download fastext Model |
40 | 64 |
|
41 |
| -We use fastext to detect language for evaluation. Please download lid.176.ftz from [this link](https://fasttext.cc/docs/en/language-identification.html) and set the path as below. |
| 65 | +We use fastext to detect language for evaluation. Please download `lid.176.ftz` from [this link](https://fasttext.cc/docs/en/language-identification.html) and place it in your current directory. If you place the file in a directory other than the current directory, specify the path to the file using the `LID176FTZ_PATH` environment variable. |
42 | 66 |
|
43 |
| -```bash |
44 |
| -export LID176FTZ_PATH="path-to-lid.176.ftz" |
45 |
| -``` |
46 | 67 |
|
47 |
| -### 3. Install necesarry libraries |
| 68 | +### 3. Install Libraries |
48 | 69 |
|
49 | 70 | ```bash
|
50 | 71 | pip install -e .
|
51 | 72 | ```
|
| 73 | +We conducted our tests in the following environment: Python Version 3.10.12 and CUDA Version 12.3. |
| 74 | +We cannot guarantee that it will work in other environments. |
52 | 75 |
|
53 |
| -* We tested under the following environment: |
54 |
| - * Python Version: 3.10 |
55 |
| - * CUDA Version: 12.3 |
56 |
| - |
57 |
| -## Evaluation |
| 76 | +### 4. Run |
58 | 77 |
|
59 | 78 | To launch evaluation, run the following script with a certain config. All configs used for the paper are in `configs`.
|
60 | 79 |
|
61 | 80 | ```bash
|
62 | 81 | python evaluate.py --config_path {path-to-config}
|
63 | 82 | ```
|
| 83 | + |
| 84 | + |
| 85 | +## Acknowledgement |
| 86 | + |
| 87 | +We would like to thank the developers of the source models for their contributions and for making their work available. |
0 commit comments