Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement leaderboard as a benchmark #234

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

RobotSail
Copy link
Member

This PR contributes the Open LLM Leaderboard v2 to become an evaluation exposed within instructlab/eval.

In particular, this exposes leaderboard with the ability for users to select a subset of the tasks in leaderboard.

In addition, this benchmark is implemented in a way such that it runs each subtask on the most optimal inference backend for a given task.

Specifically, MCQ-style tasks (GPQA, MUSR, MMLU-Pro, and BBH) are executed directly through regular HF transformers, whereas generative tasks (IFEval and MATH-Hard) get executed through vLLM.

…but this brings the core idea

Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
@mergify mergify bot added the ci-failure label Mar 17, 2025
@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure and removed ci-failure labels Mar 17, 2025
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
@mergify mergify bot added ci-failure and removed ci-failure labels Mar 20, 2025
…ptions for the `simple_evaluate` function

Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
…uctlab-eval[leaderboard]

Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant