Skip to content

[BUG] Very slow livecodebench scoring #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rawsh opened this issue Mar 27, 2025 · 2 comments
Open

[BUG] Very slow livecodebench scoring #650

rawsh opened this issue Mar 27, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rawsh
Copy link
Contributor

rawsh commented Mar 27, 2025

Describe the bug

Much slower scoring compared to livecodebench repo.

To Reproduce

Compare to LCB repo, scoring takes significantly longer and seems single threaded

Expected behavior

Match LCB scoring time with same number of threads set. I have tried setting num_process:

def codegen_metric(predictions: list[str], formatted_doc: Doc, **kwargs) -> float:
    """Estimates the Pass@1 metric for the code generation task.
    Extract the code from each prediction, Runs it for each sample and generations,
    and computes the Pass@1 over the outputs.
    """
    # Extract generated code snippets
    generated_code_snippets = [[extract_code(pred) for pred in predictions]]  # noqa: F841
    evaluation_sample = {  # noqa: F841
        "inputs": formatted_doc.specific["inputs"],
        "outputs": formatted_doc.specific["outputs"],
        "fn_name": formatted_doc.specific["fn_name"],
    }
    # This is a list of lists because
    evaluation_sample = [{"input_output": json.dumps(evaluation_sample)}]

    metrics, _ = codegen_metrics(
        evaluation_sample,
        generated_code_snippets,
        k_list=[1],  # Only run for Pass@1
        num_process_evaluate=64,
    )
    return metrics["pass@1"]

with no improvement in scoring time

Version info

Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.

latest master

conda create -n eval python=3.11
pip install vllm==0.7.2
pip install git+https://github.com/huggingface/lighteval.git#egg=lighteval[extended_tasks] math-verify==0.5.2
@rawsh rawsh added the bug Something isn't working label Mar 27, 2025
@gauss-clb
Copy link

gauss-clb commented Apr 1, 2025

I have tried metric_name="codegen_pass@1", it runs fast. But when I tried metric_name="codegen_pass@1:16", maybe bug? The warning is Sequence group 13_parallel_sample_10 is preempted by PreemptionMode.RECOMPUTE mode because there is not enough KV cache space. This can affect the end-to-end performance. Increase gpu_memory_utilization or tensor_parallel_size to provide more KV cache memory. total_num_cumulative_preemption=51 (scheduler.py:1754).

Maybe we should split 16 generations into more batch?

@rawsh
Copy link
Contributor Author

rawsh commented Apr 5, 2025

This is due to parallelism at a sample level it seems. It still scores all the questions sequentially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants