Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of the french_bench tasks are working. #2619

Open
jaslatendresse opened this issue Jan 10, 2025 · 1 comment
Open

None of the french_bench tasks are working. #2619

jaslatendresse opened this issue Jan 10, 2025 · 1 comment
Labels
asking questions For asking for clarification / support on library usage.

Comments

@jaslatendresse
Copy link

I am using llama-cpp-python to serve my model. The pipeline works fine with certain tasks (e.g., hellaswag), but none of the french_bench tasks work. It looks to me that the dataset is no longer available on HF. How can I get around this? If this is the case, the docs should be updated accordingly. Else, I could use some guidance on how to fix this. Thanks!

Code:

import json
from lm_eval.models.gguf import GGUFLM
import lm_eval

model_name="llama-3.1-8b-Q4_K_M"
task_name="french_bench_mc"

lm = GGUFLM(base_url="http://localhost:8080")

task_manager = lm_eval.tasks.TaskManager()

with open(f"{task_name}={model_name}.json", "w") as json_file:
    results = lm_eval.simple_evaluate(model=lm, tasks=[task_name], device="mps", num_fewshot=5, limit=100, task_manager=task_manager)

    filtered_results = {key: value for key, value in results.items()}

    json.dump(filtered_results, json_file, indent=4)

Stack trace:

2025-01-10:11:16:23,799 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-10:11:16:23,799 INFO     [evaluator.py:217] Using pre-initialized model
Using the latest cached version of the dataset since manu/french_bench_hellaswag couldn't be found on the Hugging Face Hub
2025-01-10:11:16:24,348 WARNING  [load.py:1645] Using the latest cached version of the dataset since manu/french_bench_hellaswag couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_hellaswag/default/0.0.0/f9f7e441f4a5c30863aa5a0ec09bbd1c3ab4f5e0 (last modified on Thu Aug 15 11:46:41 2024).
2025-01-10:11:16:24,348 WARNING  [cache.py:95] Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_hellaswag/default/0.0.0/f9f7e441f4a5c30863aa5a0ec09bbd1c3ab4f5e0 (last modified on Thu Aug 15 11:46:41 2024).
Using the latest cached version of the dataset since manu/french_bench_arc_challenge couldn't be found on the Hugging Face Hub
2025-01-10:11:16:25,329 WARNING  [load.py:1645] Using the latest cached version of the dataset since manu/french_bench_arc_challenge couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_arc_challenge/default/0.0.0/25a80534368914cfd6aa4c5f73262d64ce2d61fb (last modified on Thu Aug 15 12:01:17 2024).
2025-01-10:11:16:25,331 WARNING  [cache.py:95] Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_arc_challenge/default/0.0.0/25a80534368914cfd6aa4c5f73262d64ce2d61fb (last modified on Thu Aug 15 12:01:17 2024).
Traceback (most recent call last):
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/benchmarking/scripts/eval.py", line 19, in <module>
    results = lm_eval.simple_evaluate(model=lm, tasks=["french_bench_mc"], device="mps", num_fewshot=5, limit=100, task_manager=task_manager)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/evaluator.py", line 235, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 618, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 414, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 332, in _load_individual_task_or_group
    collections.ChainMap(*map(fn, reversed(subtask_list)))
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 314, in _load_individual_task_or_group
    return _load_task(task_config, task=name_or_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 280, in _load_task
    task_object = ConfigurableTask(config=config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/api/task.py", line 819, in __init__
    self.download(self.config.dataset_kwargs)
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/api/task.py", line 926, in download
    self.dataset = datasets.load_dataset(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 2606, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 2277, in load_dataset_builder
    dataset_module = dataset_module_factory(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 1923, in dataset_module_factory
    raise e1 from None
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 1843, in dataset_module_factory
    dataset_info = hf_api.dataset_info(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 2366, in dataset_info
    return DatasetInfo(**data)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 799, in __init__
    self.tags = kwargs.pop("tags")
                ^^^^^^^^^^^^^^^^^^
KeyError: 'tags'
@baberabb
Copy link
Contributor

Hi! It's working on my end. could you try installing the latest commit:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
pip install -e .

Could also try upgrading the HF packages:

pip install -U transformers datasets

@baberabb baberabb added the asking questions For asking for clarification / support on library usage. label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking questions For asking for clarification / support on library usage.
Projects
None yet
Development

No branches or pull requests

2 participants