Skip to content

Conversation

@fxmarty-amd
Copy link
Contributor

@fxmarty-amd fxmarty-amd commented Nov 13, 2025

As per title.

This PR fixes 3 issues with mmlu_redux_generative:

  • The wrong _default_template_yaml is used, because of duplicate tasks in TaskManager._task_index
  • Wrong tasks are pulled in the groups of mmlu_redux_generative (e.g. mmlu_humanities_generative), because the same name is shared with mmlu_generative => need to change "tag": "mmlu_humanities_generative" to "tag": "mmlu_redux_humanities_generative"
  • The summary table is not properly displayed because filter_list: default is missing in _mmlu.yaml

mmlu_redux_generative is currently pulling a WRONG _default_template_yaml file from https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/generative/_default_template_yaml

instead of the rightful https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu-redux/generative/_default_template_yaml

Cause

TaskManager._task_index is populated at

if self._config_is_python_task(config):
# This is a python class config
task = config["task"]
tasks_and_groups[task] = {
"type": "python_task",
"yaml_path": yaml_path,
}
_populate_tags_and_groups(
config, task, tasks_and_groups, print_info
)
elif self._config_is_group(config):
# This is a group config
tasks_and_groups[config["group"]] = {
"type": "group",
"task": -1, # This signals that
# we don't need to know
# the task list for indexing
# as it can be loaded
# when called.
"yaml_path": yaml_path,
}
# # Registered the level 1 tasks from a group config
# for config in config["task"]:
# if isinstance(config, dict) and self._config_is_task(config):
# task = config["task"]
# tasks_and_groups[task] = {
# "type": "task",
# "yaml_path": yaml_path,
# }
elif self._config_is_task(config):
# This is a task config
task = config["task"]
tasks_and_groups[task] = {
"type": "task",
"yaml_path": yaml_path,
}
_populate_tags_and_groups(
config, task, tasks_and_groups, print_info
)

and this is later used to initialize TaskConfig, etc.

Currently for the task mmlu_formal_logic_generative, it is populated from two places:

[2025-11-13 20:42:18] INFO __init__.py:542: populating task=mmlu_formal_logic_generative with /lm-evaluation-harness/lm_eval/tasks/mmlu-redux/generative/mmlu_formal_logic.yaml
[2025-11-13 20:42:18] INFO __init__.py:542: populating task=mmlu_formal_logic_generative with /lm-evaluation-harness/lm_eval/tasks/mmlu/generative/mmlu_formal_logic.yaml

and the entry using mmlu-redux/generative/mmlu_formal_logic.yaml gets silently overriden. As the two yaml & used _default_template_yaml are quite different, this eventually lead to wrongful evaluation and in my case, 0 accuracy.

I think we should completely disallow having duplicate tasks/groups, as this is bug prone, but we may do this in an other PR (see the debug log in tasks/__init__.py - quite a few with a similar issue).

cc @baberabb FYI, this probably impact anybody using mmlu_redux through lm-eval-harness.

@fxmarty-amd
Copy link
Contributor Author

Remaining overriden tasks/groups:

[2025-11-13 20:42:18] DEBUG __init__.py:555: The following tasks have been overriden in TaskManager._task_index - this may lead to unexpected behaviors for these tasks:
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('belebele', '/lm-evaluation-harness/lm_eval/tasks/afrobench/belebele/belebele.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_ca-eu', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_ca-eu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_eu-ca', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_eu-ca.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_agronomy', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_anatomy', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_ancient_chinese', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_arts', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_arts.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_astronomy', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_business_ethics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_civil_service_exam', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_driving_rule', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_food_culture', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_foreign_policy', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_history', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_literature', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_chinese_teacher_qualification', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_clinical_knowledge', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_clinical_knowledge.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_actuarial_science', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_actuarial_science.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_education', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_education.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_engineering_hydrology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_engineering_hydrology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_law', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_law.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_mathematics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_mathematics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_medical_statistics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_medical_statistics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_college_medicine', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_college_medicine.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_computer_science', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_computer_science.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_computer_security', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_computer_security.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_conceptual_physics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_conceptual_physics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_construction_project_management', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_construction_project_management.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_economics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_economics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_education', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_education.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_electrical_engineering', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_electrical_engineering.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_elementary_chinese', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_elementary_chinese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_elementary_commonsense', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_elementary_commonsense.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_elementary_information_and_technology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_elementary_information_and_technology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_elementary_mathematics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_elementary_mathematics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_ethnology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_ethnology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_food_science', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_food_science.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_genetics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_genetics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_global_facts', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_global_facts.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_biology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_biology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_chemistry', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_chemistry.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_geography', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_geography.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_mathematics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_mathematics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_physics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_physics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_high_school_politics', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_high_school_politics.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_human_sexuality', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_human_sexuality.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_international_law', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_international_law.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_journalism', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_journalism.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_jurisprudence', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_jurisprudence.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_legal_and_moral_basis', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_legal_and_moral_basis.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_logical', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_logical.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_machine_learning', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_machine_learning.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_management', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_management.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_marketing', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_marketing.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_marxist_theory', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_marxist_theory.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_modern_chinese', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_modern_chinese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_nutrition', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_nutrition.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_philosophy', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_philosophy.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_professional_accounting', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_professional_accounting.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_professional_law', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_professional_law.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_professional_medicine', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_professional_medicine.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_professional_psychology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_professional_psychology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_public_relations', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_public_relations.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_security_study', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_security_study.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_sociology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_sociology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_sports_science', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_sports_science.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_traditional_chinese_medicine', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_traditional_chinese_medicine.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_virology', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_virology.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_world_history', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_world_history.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('cmmlu_world_religions', '/lm-evaluation-harness/lm_eval/tasks/cmmlu/cmmlu_default_world_religions.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('evalita-mp_sum_fp', '/lm-evaluation-harness/lm_eval/tasks/evalita_llm/_evalita-mp_sum_fp-small_task.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_ca-gl', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_ca-gl.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_eu-gl', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_eu-gl.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_gl-ca', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_gl-ca.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_gl-eu', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_gl-eu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_albanian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Albanian/_include_base_44_albanian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_arabic', '/lm-evaluation-harness/lm_eval/tasks/include/default/Arabic/_include_base_44_arabic.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_armenian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Armenian/_include_base_44_armenian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_azerbaijani', '/lm-evaluation-harness/lm_eval/tasks/include/default/Azerbaijani/_include_base_44_azerbaijani.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_basque', '/lm-evaluation-harness/lm_eval/tasks/include/default/Basque/_include_base_44_basque.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_belarusian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Belarusian/_include_base_44_belarusian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_bengali', '/lm-evaluation-harness/lm_eval/tasks/include/default/Bengali/_include_base_44_bengali.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_bulgarian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Bulgarian/_include_base_44_bulgarian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_chinese', '/lm-evaluation-harness/lm_eval/tasks/include/default/Chinese/_include_base_44_chinese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_croatian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Croatian/_include_base_44_croatian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_dutch', '/lm-evaluation-harness/lm_eval/tasks/include/default/Dutch/_include_base_44_dutch.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_estonian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Estonian/_include_base_44_estonian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_finnish', '/lm-evaluation-harness/lm_eval/tasks/include/default/Finnish/_include_base_44_finnish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_french', '/lm-evaluation-harness/lm_eval/tasks/include/default/French/_include_base_44_french.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_georgian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Georgian/_include_base_44_georgian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_german', '/lm-evaluation-harness/lm_eval/tasks/include/default/German/_include_base_44_german.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_greek', '/lm-evaluation-harness/lm_eval/tasks/include/default/Greek/_include_base_44_greek.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hebrew', '/lm-evaluation-harness/lm_eval/tasks/include/default/Hebrew/_include_base_44_hebrew.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hindi', '/lm-evaluation-harness/lm_eval/tasks/include/default/Hindi/_include_base_44_hindi.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hungarian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Hungarian/_include_base_44_hungarian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_indonesian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Indonesian/_include_base_44_indonesian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_italian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Italian/_include_base_44_italian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_japanese', '/lm-evaluation-harness/lm_eval/tasks/include/default/Japanese/_include_base_44_japanese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_kazakh', '/lm-evaluation-harness/lm_eval/tasks/include/default/Kazakh/_include_base_44_kazakh.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_korean', '/lm-evaluation-harness/lm_eval/tasks/include/default/Korean/_include_base_44_korean.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_lithuanian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Lithuanian/_include_base_44_lithuanian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_malay', '/lm-evaluation-harness/lm_eval/tasks/include/default/Malay/_include_base_44_malay.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_malayalam', '/lm-evaluation-harness/lm_eval/tasks/include/default/Malayalam/_include_base_44_malayalam.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_nepali', '/lm-evaluation-harness/lm_eval/tasks/include/default/Nepali/_include_base_44_nepali.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_north macedonian', '/lm-evaluation-harness/lm_eval/tasks/include/default/North Macedonian/_include_base_44_north macedonian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_persian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Persian/_include_base_44_persian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_polish', '/lm-evaluation-harness/lm_eval/tasks/include/default/Polish/_include_base_44_polish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_portuguese', '/lm-evaluation-harness/lm_eval/tasks/include/default/Portuguese/_include_base_44_portuguese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_russian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Russian/_include_base_44_russian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_serbian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Serbian/_include_base_44_serbian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_spanish', '/lm-evaluation-harness/lm_eval/tasks/include/default/Spanish/_include_base_44_spanish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_tagalog', '/lm-evaluation-harness/lm_eval/tasks/include/default/Tagalog/_include_base_44_tagalog.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_tamil', '/lm-evaluation-harness/lm_eval/tasks/include/default/Tamil/_include_base_44_tamil.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_telugu', '/lm-evaluation-harness/lm_eval/tasks/include/default/Telugu/_include_base_44_telugu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_turkish', '/lm-evaluation-harness/lm_eval/tasks/include/default/Turkish/_include_base_44_turkish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_ukrainian', '/lm-evaluation-harness/lm_eval/tasks/include/default/Ukrainian/_include_base_44_ukrainian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_urdu', '/lm-evaluation-harness/lm_eval/tasks/include/default/Urdu/_include_base_44_urdu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_uzbek', '/lm-evaluation-harness/lm_eval/tasks/include/default/Uzbek/_include_base_44_uzbek.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_vietnamese', '/lm-evaluation-harness/lm_eval/tasks/include/default/Vietnamese/_include_base_44_vietnamese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_albanian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Albanian/_include_base_44_albanian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_arabic', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Arabic/_include_base_44_arabic.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_armenian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Armenian/_include_base_44_armenian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_azerbaijani', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Azerbaijani/_include_base_44_azerbaijani.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_basque', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Basque/_include_base_44_basque.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_belarusian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Belarusian/_include_base_44_belarusian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_bengali', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Bengali/_include_base_44_bengali.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_bulgarian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Bulgarian/_include_base_44_bulgarian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_chinese', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Chinese/_include_base_44_chinese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_croatian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Croatian/_include_base_44_croatian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_dutch', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Dutch/_include_base_44_dutch.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_estonian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Estonian/_include_base_44_estonian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_finnish', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Finnish/_include_base_44_finnish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_french', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/French/_include_base_44_french.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_georgian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Georgian/_include_base_44_georgian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_german', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/German/_include_base_44_german.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_greek', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Greek/_include_base_44_greek.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hebrew', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Hebrew/_include_base_44_hebrew.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hindi', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Hindi/_include_base_44_hindi.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_hungarian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Hungarian/_include_base_44_hungarian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_indonesian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Indonesian/_include_base_44_indonesian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_italian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Italian/_include_base_44_italian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_japanese', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Japanese/_include_base_44_japanese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_kazakh', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Kazakh/_include_base_44_kazakh.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_korean', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Korean/_include_base_44_korean.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_lithuanian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Lithuanian/_include_base_44_lithuanian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_malay', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Malay/_include_base_44_malay.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_malayalam', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Malayalam/_include_base_44_malayalam.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_nepali', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Nepali/_include_base_44_nepali.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_north macedonian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/North Macedonian/_include_base_44_north macedonian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_persian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Persian/_include_base_44_persian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_polish', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Polish/_include_base_44_polish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_portuguese', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Portuguese/_include_base_44_portuguese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_russian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Russian/_include_base_44_russian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_serbian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Serbian/_include_base_44_serbian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_spanish', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Spanish/_include_base_44_spanish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_tagalog', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Tagalog/_include_base_44_tagalog.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_tamil', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Tamil/_include_base_44_tamil.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_telugu', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Telugu/_include_base_44_telugu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_turkish', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Turkish/_include_base_44_turkish.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_ukrainian', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Ukrainian/_include_base_44_ukrainian.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_urdu', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Urdu/_include_base_44_urdu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_uzbek', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Uzbek/_include_base_44_uzbek.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('include_base_44_vietnamese', '/lm-evaluation-harness/lm_eval/tasks/include/few_shot_en/Vietnamese/_include_base_44_vietnamese.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_ca-pt', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_ca-pt.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_eu-pt', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_eu-pt.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_gl-pt', '/lm-evaluation-harness/lm_eval/tasks/galician_bench/flores_gl/flores_gl-pt.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_pt-ca', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_pt-ca.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_pt-eu', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_pt-eu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_pt-gl', '/lm-evaluation-harness/lm_eval/tasks/galician_bench/flores_gl/flores_pt-gl.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_ca-es', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_ca-es.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_es-ca', '/lm-evaluation-harness/lm_eval/tasks/catalan_bench/flores_ca/flores_es-ca.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_es-eu', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_es-eu.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_es-gl', '/lm-evaluation-harness/lm_eval/tasks/galician_bench/flores_gl/flores_es-gl.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_es-pt', '/lm-evaluation-harness/lm_eval/tasks/portuguese_bench/flores_pt/flores_es-pt.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_eu-es', '/lm-evaluation-harness/lm_eval/tasks/basque_bench/flores_eu/flores_eu-es.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_gl-es', '/lm-evaluation-harness/lm_eval/tasks/galician_bench/flores_gl/flores_gl-es.yaml')
[2025-11-13 20:42:18] DEBUG __init__.py:557: ('flores_pt-es', '/lm-evaluation-harness/lm_eval/tasks/portuguese_bench/flores_pt/flores_pt-es.yaml')

add debug logging

linting
@fxmarty-amd fxmarty-amd changed the title [fix] Fix mmlu_redux pulling the wrong _default_template_yaml [fix] Fix mmlu_redux pulling the wrong _default_template_yaml, using wrong tasks & not displaying summary table Nov 14, 2025
@fxmarty-amd
Copy link
Contributor Author

cc @baberabb WDYT?

@baberabb
Copy link
Contributor

Hi! this looks good. If you could resolve the conflicts after #3394, and i left a comment.

wrt to disallowing duplicates, I always thought _check_duplicates took care of that, but that only checked for duplicates in the current run and didn't actually check if duplicate task names slipped through. I'll see about adding some CI tests for that.

Related to this, I've been working on some refactoring to make the library more readable (though it's become a bit of a victim of mission creep at this point 😅). We should add this logging there as well and would appreciate a review when you get a chance. Should also definitely add a duplicate check, but one commonly requested feature is to support multiple groups with overlapping tasks (which _check_duplicates currently prohibits). I was thinking of always namespacing group tasks as they are loaded (e.g., mmlu::abstract_algebra, cmmlu::abstract_algebra) to get around that.

@fxmarty-amd fxmarty-amd changed the title [fix] Fix mmlu_redux pulling the wrong _default_template_yaml, using wrong tasks & not displaying summary table [fix] Fix mmlu_redux not displaying summary table + display to the user the tasks / yaml that are actually pulled Nov 19, 2025
@fxmarty-amd
Copy link
Contributor Author

fxmarty-amd commented Nov 19, 2025

Thank you @baberabb, as #3394 is merged, this PR only fixes the mmlu_redux_generative table display, and prints something like:

[2025-11-19 15:38:18] INFO __init__.py:699: Selected tasks:
[2025-11-19 15:38:18] INFO __init__.py:703: Group: mmlu_redux_generative
[2025-11-19 15:38:18] INFO __init__.py:710:   Subgroup: stem
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_abstract_algebra_generative (mmlu-redux/generative/mmlu_abstract_algebra.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_anatomy_generative (mmlu-redux/generative/mmlu_anatomy.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_astronomy_generative (mmlu-redux/generative/mmlu_astronomy.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_biology_generative (mmlu-redux/generative/mmlu_college_biology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_chemistry_generative (mmlu-redux/generative/mmlu_college_chemistry.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_computer_science_generative (mmlu-redux/generative/mmlu_college_computer_science.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_mathematics_generative (mmlu-redux/generative/mmlu_college_mathematics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_physics_generative (mmlu-redux/generative/mmlu_college_physics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_computer_security_generative (mmlu-redux/generative/mmlu_computer_security.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_conceptual_physics_generative (mmlu-redux/generative/mmlu_conceptual_physics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_electrical_engineering_generative (mmlu-redux/generative/mmlu_electrical_engineering.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_elementary_mathematics_generative (mmlu-redux/generative/mmlu_elementary_mathematics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_biology_generative (mmlu-redux/generative/mmlu_high_school_biology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_chemistry_generative (mmlu-redux/generative/mmlu_high_school_chemistry.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_computer_science_generative (mmlu-redux/generative/mmlu_high_school_computer_science.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_mathematics_generative (mmlu-redux/generative/mmlu_high_school_mathematics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_physics_generative (mmlu-redux/generative/mmlu_high_school_physics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_statistics_generative (mmlu-redux/generative/mmlu_high_school_statistics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_machine_learning_generative (mmlu-redux/generative/mmlu_machine_learning.yaml)
[2025-11-19 15:38:18] INFO __init__.py:710:   Subgroup: other
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_business_ethics_generative (mmlu-redux/generative/mmlu_business_ethics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_clinical_knowledge_generative (mmlu-redux/generative/mmlu_clinical_knowledge.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_college_medicine_generative (mmlu-redux/generative/mmlu_college_medicine.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_global_facts_generative (mmlu-redux/generative/mmlu_global_facts.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_human_aging_generative (mmlu-redux/generative/mmlu_human_aging.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_management_generative (mmlu-redux/generative/mmlu_management.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_marketing_generative (mmlu-redux/generative/mmlu_marketing.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_medical_genetics_generative (mmlu-redux/generative/mmlu_medical_genetics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_miscellaneous_generative (mmlu-redux/generative/mmlu_miscellaneous.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_nutrition_generative (mmlu-redux/generative/mmlu_nutrition.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_professional_accounting_generative (mmlu-redux/generative/mmlu_professional_accounting.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_professional_medicine_generative (mmlu-redux/generative/mmlu_professional_medicine.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_virology_generative (mmlu-redux/generative/mmlu_virology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:710:   Subgroup: social sciences
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_econometrics_generative (mmlu-redux/generative/mmlu_econometrics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_geography_generative (mmlu-redux/generative/mmlu_high_school_geography.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_government_and_politics_generative (mmlu-redux/generative/mmlu_high_school_government_and_politics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_macroeconomics_generative (mmlu-redux/generative/mmlu_high_school_macroeconomics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_microeconomics_generative (mmlu-redux/generative/mmlu_high_school_microeconomics.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_psychology_generative (mmlu-redux/generative/mmlu_high_school_psychology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_human_sexuality_generative (mmlu-redux/generative/mmlu_human_sexuality.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_professional_psychology_generative (mmlu-redux/generative/mmlu_professional_psychology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_public_relations_generative (mmlu-redux/generative/mmlu_public_relations.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_security_studies_generative (mmlu-redux/generative/mmlu_security_studies.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_sociology_generative (mmlu-redux/generative/mmlu_sociology.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_us_foreign_policy_generative (mmlu-redux/generative/mmlu_us_foreign_policy.yaml)
[2025-11-19 15:38:18] INFO __init__.py:710:   Subgroup: humanities
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_formal_logic_generative (mmlu-redux/generative/mmlu_formal_logic.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_european_history_generative (mmlu-redux/generative/mmlu_high_school_european_history.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_us_history_generative (mmlu-redux/generative/mmlu_high_school_us_history.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_high_school_world_history_generative (mmlu-redux/generative/mmlu_high_school_world_history.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_international_law_generative (mmlu-redux/generative/mmlu_international_law.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_jurisprudence_generative (mmlu-redux/generative/mmlu_jurisprudence.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_logical_fallacies_generative (mmlu-redux/generative/mmlu_logical_fallacies.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_moral_disputes_generative (mmlu-redux/generative/mmlu_moral_disputes.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_moral_scenarios_generative (mmlu-redux/generative/mmlu_moral_scenarios.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_philosophy_generative (mmlu-redux/generative/mmlu_philosophy.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_prehistory_generative (mmlu-redux/generative/mmlu_prehistory.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_professional_law_generative (mmlu-redux/generative/mmlu_professional_law.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688:     Task: mmlu_redux_world_religions_generative (mmlu-redux/generative/mmlu_world_religions.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688: Task: gsm8k_platinum (gsm8k_platinum/gsm8k-platinum.yaml)
[2025-11-19 15:38:18] INFO __init__.py:688: Task: gpqa_diamond_generative_n_shot (gpqa/generative/gpqa_diamond_generative_n_shot.yaml)

to the user. Does it sound good?

@fxmarty-amd
Copy link
Contributor Author

wdyt?

@baberabb
Copy link
Contributor

Looks great! Thanks! Added a check to only log the filter warnings for generation tasks.

@baberabb baberabb merged commit c1d2747 into EleutherAI:main Nov 26, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants