Skip to content

Add LongBench V2 benchmark #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

eshwarprasadS
Copy link

@eshwarprasadS eshwarprasadS commented Apr 30, 2025

Adding LongBench to eval options,

Install extras with:

pip install instructlab-eval[longbench]

Uses VLLM backend for serving the model for generation

Runs like so:

evaluator = LongBenchEvaluator(
    model_path="path/to/model",
    num_gpus=N,
    output_file="path/to/results.json",
    eval_config={"batch_size": "auto"},
    vllm_config={"max_model_len": max_len}
)

results = evaluator.run()  # Returns LongBenchResult

Output json looks like so:

{
  "en_multidoc": 0.5424139838230786,
  "zh_multidoc": 0.24335639081098673,
  "en_singledoc": 0.4233139199560039,
  "zh_singledoc": 0.46157875457875464,
  "en_summ": 0.27244809337990245,
  "zh_summ": 0.1359562304911904,
  "en_fewshot": 0.45692449627485754,
  "zh_fewshot": 0.24416666666666667,
  "en_synthetic": 0.3799285714285714,
  "zh_synthetic": 0.4775,
  "code_avg": 0.30225,
  "overall_score": 0.3581670097645466
}

@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure labels Apr 30, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @eshwarprasadS !

The PR has all of the right ideas, there are just a few minor changes that you'll want to make which I've outlined in this review. Once we've addressed those, this should be good to merge

) / 2

# Calculate overall score
all_scores = [v for k, v in eval_results.items() if k != "overall_score"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check if k != "overall_score"? We shouldn't have set this key yet

@mergify mergify bot added ci-failure and removed ci-failure labels May 1, 2025
@mergify mergify bot added ci-failure testing Relates to testing and removed ci-failure labels May 9, 2025
@mergify mergify bot removed the ci-failure label May 25, 2025
@RobotSail
Copy link
Member

@eshwarprasadS It looks like you may need to rebase your changes

@mergify mergify bot added ci-failure and removed ci-failure labels May 25, 2025
@mergify mergify bot added ci-failure and removed ci-failure labels Jun 2, 2025
@RobotSail
Copy link
Member

@mergify rebase

Copy link
Contributor

mergify bot commented Jun 2, 2025

rebase

✅ Branch has been successfully rebased

Copy link
Contributor

mergify bot commented Jun 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @eshwarprasadS please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 2, 2025
@RobotSail
Copy link
Member

@eshwarprasadS It looks like you have a few merge conflicts that need to be fixed. Once those are solved, we can merge this.

Signed-off-by: Eshwar Prasad Sivaramakrishnan <[email protected]>
@mergify mergify bot removed the needs-rebase label Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants