[Evals] Add support for OMNI Math by SumanthRH · Pull Request #97 · NovaSky-AI/SkyThought

SumanthRH · 2025-03-22T00:20:21Z

What does this PR do?

Adds OMNI-Math. Part 1 of resolving #51 .

FOr OMNI-Math, the authors did LLM as a Judge evaluation with GPT-4o, but also provided a benchmark subset for rule-based evlauation. We do not support LLM as a Judge yet, so this PR only adds support for rule-based evaluation.

The dataset for rule-based evaluation is provided in this repo: https://github.com/KbsdJames/omni-math-rule

I have uploaded the same as an alternate split in the official HF dataset: https://huggingface.co/datasets/KbsdJames/Omni-MATH/discussions/2

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

erictang000

nice

Adds OMNI-Math. Part 1 of resolving NovaSky-AI#51 . FOr OMNI-Math, the authors did LLM as a Judge evaluation with GPT-4o, but also provided a benchmark subset for rule-based evlauation. We do not support LLM as a Judge yet, so this PR only adds support for rule-based evaluation. The dataset for rule-based evaluation is provided in this repo: https://github.com/KbsdJames/omni-math-rule I have uploaded the same as an alternate split in the official HF dataset: https://huggingface.co/datasets/KbsdJames/Omni-MATH/discussions/2

SumanthRH added 4 commits March 21, 2025 23:21

initial commit for omni math

ded75f4

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

add handler to __init__

801e68c

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

fix

71074e0

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

x

a0500fc

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH requested review from DachengLi1 and erictang000 March 22, 2025 00:20

erictang000 approved these changes Mar 24, 2025

View reviewed changes

SumanthRH merged commit ca01546 into main Mar 24, 2025
2 checks passed

SumanthRH linked an issue Mar 28, 2025 that may be closed by this pull request

Adding OMNI and LiveAOPS evaluation #51

Closed

SumanthRH mentioned this pull request Mar 28, 2025

Adding OMNI and LiveAOPS evaluation #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evals] Add support for OMNI Math#97

[Evals] Add support for OMNI Math#97
SumanthRH merged 4 commits intomainfrom
sumanthrh/add_omni_math

SumanthRH commented Mar 22, 2025

Uh oh!

erictang000 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

SumanthRH commented Mar 22, 2025

What does this PR do?

Uh oh!

erictang000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments