Skip to content

Commit

Permalink
Merge branch 'main' into llama
Browse files Browse the repository at this point in the history
# Conflicts:
#	lm_eval/tasks/llama3/README.md
  • Loading branch information
baberabb committed Jan 21, 2025
2 parents 748eb47 + b2c090c commit bc4b922
Show file tree
Hide file tree
Showing 3,406 changed files with 24,482 additions and 887 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
11 changes: 6 additions & 5 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Python 3.8
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: 3.9
cache: pip
cache-dependency-path: pyproject.toml
- name: Pre-Commit
Expand All @@ -42,7 +42,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.8", "3.9", "3.10", "3.11" ]
python-version: ["3.9", "3.10", "3.11", "3.12" ]
timeout-minutes: 30
steps:
- name: Checkout Code
Expand Down Expand Up @@ -75,15 +75,16 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Python 3.8
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: 3.9
cache: pip
cache-dependency-path: pyproject.toml
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e '.[dev,optimum,deepsparse,sparseml,api]' --extra-index-url https://download.pytorch.org/whl/cpu
pip install -U transformers peft
- name: Test with pytest
run: python -m pytest tests/models --showlocals -s -vv
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ repos:
- id: mixed-line-ending
args: [--fix=lf]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.7.4
rev: v0.9.2
hooks:
# Run the linter.
- id: ruff
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ Note that for externally hosted models, configs such as `--device` which relate
| vLLM | :heavy_check_mark: | `vllm` | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| Mamba | :heavy_check_mark: | `mamba_ssm` | [Mamba architecture Language Models via the `mamba_ssm` package](https://huggingface.co/state-spaces) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| Huggingface Optimum (Causal LMs) | ✔️ | `openvino` | Any decoder-only AutoModelForCausalLM converted with Huggingface Optimum into OpenVINO™ Intermediate Representation (IR) format | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| Huggingface Optimum-intel IPEX (Causal LMs) | ✔️ | `ipex` | Any decoder-only AutoModelForCausalLM | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| Neuron via AWS Inf2 (Causal LMs) | ✔️ | `neuronx` | Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| [Neural Magic DeepSparse](https://github.com/neuralmagic/deepsparse) | ✔️ | `deepsparse` | Any LM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub with the "deepsparse" tag](https://huggingface.co/models?other=deepsparse) | `generate_until`, `loglikelihood` | ... |
| [Neural Magic SparseML](https://github.com/neuralmagic/sparseml) | ✔️ | `sparseml` | Any decoder-only AutoModelForCausalLM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub](https://huggingface.co/neuralmagic). Especially useful for models with quantization like [`zoo:llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized`](https://sparsezoo.neuralmagic.com/models/llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
Expand Down Expand Up @@ -492,6 +493,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
| hf_transfer | For speeding up HF Hub file downloads |
| ifeval | For running the IFEval task |
| ibm_watsonx_ai | For using IBM watsonx.ai model apis |
| ipex | For running on optimum-intel ipex backend |
| neuronx | For running on AWS inf2 instances |
| mamba | For loading Mamba SSM models |
| math | For running math task answer checking |
Expand Down
2 changes: 1 addition & 1 deletion docs/interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ This mode supports a number of command-line arguments, the details of which can

* `--seed`: Set seed for python's random, numpy and torch. Accepts a comma-separated list of 3 values for python's random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or 'None' to not set the seed. Default is `0,1234,1234` (for backward compatibility). E.g. `--seed 0,None,8` sets `random.seed(0)` and `torch.manual_seed(8)`. Here numpy's seed is not set since the second value is `None`. E.g, `--seed 42` sets all three seeds to 42.

* `--wandb_args`: Tracks logging to Weights and Biases for evaluation runs and includes args passed to `wandb.init`, such as `project` and `job_type`. Full list [here](https://docs.wandb.ai/ref/python/init). e.g., ```--wandb_args project=test-project,name=test-run```
* `--wandb_args`: Tracks logging to Weights and Biases for evaluation runs and includes args passed to `wandb.init`, such as `project` and `job_type`. Full list [here](https://docs.wandb.ai/ref/python/init). e.g., ```--wandb_args project=test-project,name=test-run```. Also allows for the passing of the step to log things at (passed to `wandb.run.log`), e.g., `--wandb_args step=123`.

* `--hf_hub_log_args` : Logs evaluation results to Hugging Face Hub. Accepts a string with the arguments separated by commas. Available arguments:
* `hub_results_org` - organization name on Hugging Face Hub, e.g., `EleutherAI`. If not provided, the results will be pushed to the owner of the Hugging Face token,
Expand Down
5 changes: 3 additions & 2 deletions docs/new_task_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ doc_to_target: "{{answer}}"
```
**Important**: we now add `target_delimiter` between input and target which defaults to " ", such that the full input-output string is `doc_to_target(doc) + target_delimiter + doc_to_text(doc)`. `doc_to_text` and `doc_to_target` should not contain trailing right or left whitespace, respectively.
> [!WARNING]
> We add `target_delimiter` between input and target which defaults to " ", such that the full input-output string is `doc_to_text(doc) + target_delimiter + doc_to_target(doc)`. `doc_to_text` and `doc_to_target` should not contain trailing right or left whitespace, respectively. For multiple choice the target will be each choice index concatenated with the delimiter.


#### Multiple choice format
Expand All @@ -206,7 +207,7 @@ doc_to_choice: "{{[distractor1, distractor2, distractor3, correct_answer]}}"
```
Task implementers are thus able to decide what the answer choices should be for a document, and what prompt format to use.

The label index can also be sourced from a feature directly. For example in `superglue/boolq`, the label index if defined in the feature `label`. We can set `doc_to_target` as simply `label`. The options or verbalizers can be written in a the form of a list `["no", "yes"]` that will correspond to the label index.
The label index can also be sourced from a feature directly. For example in `superglue/boolq`, the label index if defined in the feature `label`. We can set `doc_to_target` as simply `label`. The options or verbalizers can be written in the form of a list `["no", "yes"]` that will correspond to the label index.

```yaml
doc_to_text: "{{passage}}\nQuestion: {{question}}?\nAnswer:"
Expand Down
1 change: 1 addition & 0 deletions docs/task_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Prompting / in-context formatting options:
- **doc_to_choice** (`Union[Callable, str]`, *optional*) — Jinja2 template, string, or function to process a sample into a list of possible string choices for `multiple_choice` tasks. Left undefined for `generate_until` tasks.
- **fewshot_delimiter** (`str`, *optional*, defaults to "\n\n") — String to insert between few-shot examples.
- **target_delimiter** (`str`, *optional*, defaults to `" "`) — String to insert between input and target output for the datapoint being tested.
- **assistant_prefill** (`str`, *optional*) — String to append after the <|assistant|> token. For example, if the task is to generate a question, the assistant_prefill could be "The answer is: " to prompt the model to generate an answer to the question. If not using a chat template then this string will be appended to the end of the prompt.

Runtime configuration options:
- **num_fewshot** (`int`, *optional*, defaults to 0) — Number of few-shot examples before the input.
Expand Down
6 changes: 6 additions & 0 deletions lm_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,11 @@ def setup_parser() -> argparse.ArgumentParser:
action="store_true",
help="Sets trust_remote_code to True to execute code to create HF Datasets from the Hub",
)
parser.add_argument(
"--confirm_run_unsafe_code",
action="store_true",
help="Confirm that you understand the risks of running unsafe code for tasks that require it",
)
return parser


Expand Down Expand Up @@ -404,6 +409,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
numpy_random_seed=args.seed[1],
torch_random_seed=args.seed[2],
fewshot_random_seed=args.seed[3],
confirm_run_unsafe_code=args.confirm_run_unsafe_code,
**request_caching_args,
)

Expand Down
4 changes: 1 addition & 3 deletions lm_eval/api/group.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,4 @@ def group_name(self) -> Any:
return self._config.group

def __repr__(self):
return (
f"ConfigurableGroup(group={self.group}," f"group_alias={self.group_alias})"
)
return f"ConfigurableGroup(group={self.group},group_alias={self.group_alias})"
6 changes: 3 additions & 3 deletions lm_eval/api/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -527,9 +527,9 @@ def pooled_sample_stderr(stderrs: List[float], sizes: List[int]):


def combined_sample_stderr(stderrs: List[float], sizes: List[int], metrics=None):
assert (
metrics is not None
), "Need to pass a list of each subtask's metric for this stderr aggregation"
assert metrics is not None, (
"Need to pass a list of each subtask's metric for this stderr aggregation"
)
assert len(stderrs) == len(sizes) and len(sizes) == len(metrics)

# See https://github.com/EleutherAI/lm-evaluation-harness/pull/1390 for more documentation.
Expand Down
6 changes: 5 additions & 1 deletion lm_eval/api/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,17 @@ def generate_until(self, requests) -> List[str]:
"""
pass

def apply_chat_template(self, chat_history: List[Dict[str, str]]) -> str:
def apply_chat_template(
self, chat_history: List[Dict[str, str]], add_generation_prompt=True
) -> str:
"""
Defines how to transform few-shot examples provided as chat history into a format that can be used as input to the LM.
:param chat_history: list[dict[str, str]]
A list of dictionaries with keys 'role' and 'content'.
Values are strings representing the role name and the content of the message, respectively.
:param add_generation_prompt: bool
Whether to append an assistant gen prefix (for e.g. <|assistant|>) to the assistant messages in the chat history. False if prefilling an assistant message.
:return: str
A string representing the chat history in a format that can be used as input to the LM.
"""
Expand Down
30 changes: 15 additions & 15 deletions lm_eval/api/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ def register_model(*names):

def decorate(cls):
for name in names:
assert issubclass(
cls, LM
), f"Model '{name}' ({cls.__name__}) must extend LM class"
assert issubclass(cls, LM), (
f"Model '{name}' ({cls.__name__}) must extend LM class"
)

assert (
name not in MODEL_REGISTRY
), f"Model named '{name}' conflicts with existing model! Please register with a non-conflicting alias instead."
assert name not in MODEL_REGISTRY, (
f"Model named '{name}' conflicts with existing model! Please register with a non-conflicting alias instead."
)

MODEL_REGISTRY[name] = cls
return cls
Expand All @@ -48,9 +48,9 @@ def get_model(model_name):

def register_task(name):
def decorate(fn):
assert (
name not in TASK_REGISTRY
), f"task named '{name}' conflicts with existing registered task!"
assert name not in TASK_REGISTRY, (
f"task named '{name}' conflicts with existing registered task!"
)

TASK_REGISTRY[name] = fn
ALL_TASKS.add(name)
Expand Down Expand Up @@ -104,9 +104,9 @@ def decorate(fn):
]:
if key in args:
value = args[key]
assert (
value not in registry
), f"{key} named '{value}' conflicts with existing registered {key}!"
assert value not in registry, (
f"{key} named '{value}' conflicts with existing registered {key}!"
)

if key == "metric":
registry[name] = fn
Expand Down Expand Up @@ -140,9 +140,9 @@ def get_metric(name: str, hf_evaluate_metric=False) -> Callable:

def register_aggregation(name: str):
def decorate(fn):
assert (
name not in AGGREGATION_REGISTRY
), f"aggregation named '{name}' conflicts with existing registered aggregation!"
assert name not in AGGREGATION_REGISTRY, (
f"aggregation named '{name}' conflicts with existing registered aggregation!"
)

AGGREGATION_REGISTRY[name] = fn
return fn
Expand Down
Loading

0 comments on commit bc4b922

Please sign in to comment.