You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using llama-cpp-python to serve my model. The pipeline works fine with certain tasks (e.g., hellaswag), but none of the french_bench tasks work. It looks to me that the dataset is no longer available on HF. How can I get around this? If this is the case, the docs should be updated accordingly. Else, I could use some guidance on how to fix this. Thanks!
Code:
import json
from lm_eval.models.gguf import GGUFLM
import lm_eval
model_name="llama-3.1-8b-Q4_K_M"
task_name="french_bench_mc"
lm = GGUFLM(base_url="http://localhost:8080")
task_manager = lm_eval.tasks.TaskManager()
with open(f"{task_name}={model_name}.json", "w") as json_file:
results = lm_eval.simple_evaluate(model=lm, tasks=[task_name], device="mps", num_fewshot=5, limit=100, task_manager=task_manager)
filtered_results = {key: value for key, value in results.items()}
json.dump(filtered_results, json_file, indent=4)
Stack trace:
2025-01-10:11:16:23,799 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-10:11:16:23,799 INFO [evaluator.py:217] Using pre-initialized model
Using the latest cached version of the dataset since manu/french_bench_hellaswag couldn't be found on the Hugging Face Hub
2025-01-10:11:16:24,348 WARNING [load.py:1645] Using the latest cached version of the dataset since manu/french_bench_hellaswag couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_hellaswag/default/0.0.0/f9f7e441f4a5c30863aa5a0ec09bbd1c3ab4f5e0 (last modified on Thu Aug 15 11:46:41 2024).
2025-01-10:11:16:24,348 WARNING [cache.py:95] Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_hellaswag/default/0.0.0/f9f7e441f4a5c30863aa5a0ec09bbd1c3ab4f5e0 (last modified on Thu Aug 15 11:46:41 2024).
Using the latest cached version of the dataset since manu/french_bench_arc_challenge couldn't be found on the Hugging Face Hub
2025-01-10:11:16:25,329 WARNING [load.py:1645] Using the latest cached version of the dataset since manu/french_bench_arc_challenge couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_arc_challenge/default/0.0.0/25a80534368914cfd6aa4c5f73262d64ce2d61fb (last modified on Thu Aug 15 12:01:17 2024).
2025-01-10:11:16:25,331 WARNING [cache.py:95] Found the latest cached dataset configuration 'default' at /Users/jasminelatendresse/.cache/huggingface/datasets/manu___french_bench_arc_challenge/default/0.0.0/25a80534368914cfd6aa4c5f73262d64ce2d61fb (last modified on Thu Aug 15 12:01:17 2024).
Traceback (most recent call last):
File "/Users/jasminelatendresse/exp-os-assistant-redaction/benchmarking/scripts/eval.py", line 19, in <module>
results = lm_eval.simple_evaluate(model=lm, tasks=["french_bench_mc"], device="mps", num_fewshot=5, limit=100, task_manager=task_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/evaluator.py", line 235, in simple_evaluate
task_dict = get_task_dict(tasks, task_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 618, in get_task_dict
task_name_from_string_dict = task_manager.load_task_or_group(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 414, in load_task_or_group
collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 332, in _load_individual_task_or_group
collections.ChainMap(*map(fn, reversed(subtask_list)))
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 314, in _load_individual_task_or_group
return _load_task(task_config, task=name_or_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 280, in _load_task
task_object = ConfigurableTask(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/api/task.py", line 819, in __init__
self.download(self.config.dataset_kwargs)
File "/Users/jasminelatendresse/exp-os-assistant-redaction/lm-evaluation-harness/lm_eval/api/task.py", line 926, in download
self.dataset = datasets.load_dataset(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/datasets/load.py", line 1843, in dataset_module_factory
dataset_info = hf_api.dataset_info(
^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 2366, in dataset_info
return DatasetInfo(**data)
^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/miniforge3/envs/frenchllm/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 799, in __init__
self.tags = kwargs.pop("tags")
^^^^^^^^^^^^^^^^^^
KeyError: 'tags'
The text was updated successfully, but these errors were encountered:
I am using
llama-cpp-python
to serve my model. The pipeline works fine with certain tasks (e.g.,hellaswag
), but none of thefrench_bench
tasks work. It looks to me that the dataset is no longer available on HF. How can I get around this? If this is the case, the docs should be updated accordingly. Else, I could use some guidance on how to fix this. Thanks!Code:
Stack trace:
The text was updated successfully, but these errors were encountered: