Skip to content

Implement leaderboard as a benchmark #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 16, 2025

Conversation

RobotSail
Copy link
Member

This PR contributes the Open LLM Leaderboard v2 to become an evaluation exposed within instructlab/eval.

In particular, this exposes leaderboard with the ability for users to select a subset of the tasks in leaderboard.

In addition, this benchmark is implemented in a way such that it runs each subtask on the most optimal inference backend for a given task.

Specifically, MCQ-style tasks (GPQA, MUSR, MMLU-Pro, and BBH) are executed directly through regular HF transformers, whereas generative tasks (IFEval and MATH-Hard) get executed through vLLM.

@mergify mergify bot added the ci-failure label Mar 17, 2025
@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure documentation Improvements or additions to documentation and removed ci-failure labels Mar 17, 2025
Copy link

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming, the failing unit test around StrEnum import is figured out, this looks good. The new code is purely additive, and the extra requirements are in a separate requirements txt file so it shouldn't cause any issues with downstream builds.

@mergify mergify bot added the one-approval label Apr 15, 2025
@mergify mergify bot added the testing Relates to testing label Apr 16, 2025
…but this brings the core idea

Signed-off-by: Oleg Silkin <[email protected]>
Signed-off-by: Oleg Silkin <[email protected]>
Signed-off-by: Oleg Silkin <[email protected]>
…ptions for the `simple_evaluate` function

Signed-off-by: Oleg Silkin <[email protected]>
Signed-off-by: Oleg Silkin <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 16, 2025
@mergify mergify bot added CI/CD Affects CI/CD configuration ci-failure and removed ci-failure labels Apr 16, 2025
@mergify mergify bot removed the ci-failure label Apr 16, 2025
@RobotSail RobotSail merged commit cea8acd into instructlab:main Apr 16, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation one-approval testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants