You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you cmaim that GPQA task has no subtask which task is used by the leader board.
here are the list of tasks associated with the gpqa task in lm-eval
Tasks
Version
Filter
n-shot
Metric
Value
Stderr
leaderboard_gpqa
N/A
- leaderboard_gpqa_diamond
1
none
0
acc_norm
↑
0.3384
±
0.0337
- leaderboard_gpqa_extended
1
none
0
acc_norm
↑
0.3205
±
0.0200
- leaderboard_gpqa_main
1
none
0
acc_norm
↑
0.3438
±
0.0225
`
The text was updated successfully, but these errors were encountered:
sorobedio
changed the title
Regarding GPQA What do you meant by "For tasks without subtasks (e.g., GPQA, MMLU-PRO), the normalization process is straightforward:"
Regarding GPQA What do you mean by "For tasks without subtasks (e.g., GPQA, MMLU-PRO), the normalization process is straightforward:"
Jul 25, 2024
If you cmaim that GPQA task has no subtask which task is used by the leader board.
here are the list of tasks associated with the gpqa task in lm-eval
The text was updated successfully, but these errors were encountered: