-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[fix] Fix mmlu_redux not displaying summary table + display to the user the tasks / yaml that are actually pulled #3406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Remaining overriden tasks/groups: |
add debug logging linting
9ab9270 to
09154d6
Compare
_default_template_yaml_default_template_yaml, using wrong tasks & not displaying summary table
|
cc @baberabb WDYT? |
|
Hi! this looks good. If you could resolve the conflicts after #3394, and i left a comment. wrt to disallowing duplicates, I always thought Related to this, I've been working on some refactoring to make the library more readable (though it's become a bit of a victim of mission creep at this point 😅). We should add this logging there as well and would appreciate a review when you get a chance. Should also definitely add a duplicate check, but one commonly requested feature is to support multiple groups with overlapping tasks (which _check_duplicates currently prohibits). I was thinking of always namespacing group tasks as they are loaded (e.g., mmlu::abstract_algebra, cmmlu::abstract_algebra) to get around that. |
_default_template_yaml, using wrong tasks & not displaying summary table|
Thank you @baberabb, as #3394 is merged, this PR only fixes the mmlu_redux_generative table display, and prints something like: to the user. Does it sound good? |
|
wdyt? |
|
Looks great! Thanks! Added a check to only log the filter warnings for generation tasks. |
As per title.
This PR fixes 3 issues with mmlu_redux_generative:
_default_template_yamlis used, because of duplicate tasks inTaskManager._task_indexmmlu_redux_generative(e.g.mmlu_humanities_generative), because the same name is shared withmmlu_generative=> need to change"tag": "mmlu_humanities_generative"to"tag": "mmlu_redux_humanities_generative"filter_list: defaultis missing in_mmlu.yamlmmlu_redux_generativeis currently pulling a WRONG_default_template_yamlfile from https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/generative/_default_template_yamlinstead of the rightful https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu-redux/generative/_default_template_yaml
Cause
TaskManager._task_indexis populated atlm-evaluation-harness/lm_eval/tasks/__init__.py
Lines 497 to 537 in 68b0365
and this is later used to initialize
TaskConfig, etc.Currently for the task
mmlu_formal_logic_generative, it is populated from two places:and the entry using
mmlu-redux/generative/mmlu_formal_logic.yamlgets silently overriden. As the two yaml & used_default_template_yamlare quite different, this eventually lead to wrongful evaluation and in my case,0accuracy.I think we should completely disallow having duplicate tasks/groups, as this is bug prone, but we may do this in an other PR (see the debug log in
tasks/__init__.py- quite a few with a similar issue).cc @baberabb FYI, this probably impact anybody using mmlu_redux through lm-eval-harness.