You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-18Lines changed: 10 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ This project provides a unified framework to test generative language models on
13
13
- Evaluation with publicly available prompts ensures reproducibility and comparability between papers.
14
14
- Easy support for custom prompts and evaluation metrics.
15
15
16
-
The Language Model Evaluation Harness is the backend for 🤗 Hugging Face's popular [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), has been used in [hundreds of papers](https://scholar.google.com/scholar?oi=bibs&hl=en&authuser=2&cites=15052937328817631261,4097184744846514103,17476825572045927382,18443729326628441434,12854182577605049984) is used internally by dozens of companies including NVIDIA, Cohere, Booz Allen Hamilton, and Mosaic ML.
16
+
The Language Model Evaluation Harness is the backend for 🤗 Hugging Face's popular [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), has been used in [hundreds of papers](https://scholar.google.com/scholar?oi=bibs&hl=en&authuser=2&cites=15052937328817631261,4097184744846514103,17476825572045927382,18443729326628441434,12854182577605049984) is used internally by dozens of companies including NVIDIA, Cohere, Nous Research, Booz Allen Hamilton, and Mosaic ML.
17
17
18
18
## Install
19
19
@@ -47,8 +47,7 @@ We also provide a number of optional dependencies for . Extras can be installed
47
47
To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models) (e.g. GPT-J-6B) on `hellaswag` you can use the following command:
48
48
49
49
```bash
50
-
lm_eval \
51
-
--model hf \
50
+
lm_eval --model hf \
52
51
--model_args pretrained=EleutherAI/gpt-j-6B \
53
52
--tasks hellaswag \
54
53
--device cuda:0 \
@@ -58,8 +57,7 @@ lm_eval \
58
57
Additional arguments can be provided to the model constructor using the `--model_args` flag. Most notably, this supports the common practice of using the `revisions` feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:
@@ -71,8 +69,7 @@ Models that are loaded via both `transformers.AutoModelForCausalLM` (autoregress
71
69
Batch size selection can be automated by setting the ```--batch_size``` flag to ```auto```. This will perform automatic detection of the largest batch size that will fit on your device. On tasks where there is a large difference between the longest and shortest example, it can be helpful to periodically recompute the largest batch size, to gain a further speedup. To do this, append ```:N``` to above flag to automatically recompute the largest batch size ```N``` times. For example, to recompute the batch size 4 times, the command would be:
Alternatively, you can use `lm-eval` instead of `lm_eval`.
83
80
84
-
> ![Note]
81
+
> [!Note]
85
82
> Just like you can provide a local path to `transformers.AutoModel`, you can also provide a local path to `lm_eval` via `--model_args pretrained=/path/to/model`
86
83
87
84
#### Multi-GPU Evaluation with Hugging Face `accelerate`
88
85
89
86
To parallelize evaluation of HuggingFace models across multiple GPUs, we leverage the [accelerate 🚀](https://github.com/huggingface/accelerate) library as follows:
We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html).
116
112
117
113
```bash
118
-
lm_eval \
119
-
--model vllm \
114
+
lm_eval --model vllm \
120
115
--model_args pretrained={model_name},tensor_parallel_size={number of GPUs to use},dtype=auto,gpu_memory_utilization=0.8 \
121
116
--tasks lambada_openai \
122
117
--batch_size auto
@@ -177,8 +172,7 @@ If you have a Metal compatible Mac, you can run the eval harness using the MPS b
177
172
To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:
178
173
179
174
```bash
180
-
lm_eval \
181
-
--model openai \
175
+
lm_eval --model openai \
182
176
--model_args engine=davinci \
183
177
--tasks lambada_openai,hellaswag \
184
178
--check_integrity
@@ -188,8 +182,7 @@ lm_eval \
188
182
189
183
For models loaded with the HuggingFace `transformers` library, any arguments provided via `--model_args` get passed to the relevant constructor directly. This means that anything you can do with `AutoModel` can be done with our library. For example, you can pass a local path via `pretrained=` or use models finetuned with [PEFT](https://github.com/huggingface/peft) by taking the call you would run to evaluate the base model and add `,peft=PATH` to the `model_args` argument:
[GPTQ](https://github.com/PanQiWei/AutoGPTQ) quantized models can be loaded by specifying their file names in `,gptq=NAME` (or `,gptq=True` for default names) in the `model_args` argument:
0 commit comments