Skip to content

Commit b529c19

Browse files
authored
[MMLU redux] Do not use samples which do not have error_type="ok" (#3410)
* fix dataset - do not use samples which do not have error_type='ok' * increase version
1 parent 29a0765 commit b529c19

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

lm_eval/tasks/mmlu-redux/generative/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,5 +57,7 @@ If other tasks on this dataset are already supported:
5757
- [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
5858
- [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
5959

60-
ver 1: PR #2705
61-
First implementation
60+
61+
62+
- version 3: First implementation from PR https://github.com/EleutherAI/lm-evaluation-harness/pull/2705.
63+
- version 4: Filter out the answers not marked as `error_type="ok"` in https://github.com/EleutherAI/lm-evaluation-harness/pull/3410 (~6.5% of questions, 370 out of 5700 are filtered out).

lm_eval/tasks/mmlu-redux/generative/_default_template_yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
dataset_path: "edinburgh-dawg/mmlu-redux-2.0"
1+
dataset_path: "fxmarty/mmlu-redux-2.0-ok"
22
test_split: test
33
dataset_kwargs:
44
trust_remote_code: true
@@ -29,4 +29,4 @@ filter_list:
2929
- function: take_first
3030

3131
metadata:
32-
version: 3.0
32+
version: 4.0

lm_eval/tasks/mmlu-redux/generative/_mmlu.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ aggregate_metric_list:
3030
metric: exact_match
3131
weight_by_size: true
3232
metadata:
33-
version: 3
33+
version: 4

0 commit comments

Comments
 (0)