You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Much slower scoring compared to livecodebench repo.
To Reproduce
Compare to LCB repo, scoring takes significantly longer and seems single threaded
Expected behavior
Match LCB scoring time with same number of threads set. I have tried setting num_process:
defcodegen_metric(predictions: list[str], formatted_doc: Doc, **kwargs) ->float:
"""Estimates the Pass@1 metric for the code generation task. Extract the code from each prediction, Runs it for each sample and generations, and computes the Pass@1 over the outputs. """# Extract generated code snippetsgenerated_code_snippets= [[extract_code(pred) forpredinpredictions]] # noqa: F841evaluation_sample= { # noqa: F841"inputs": formatted_doc.specific["inputs"],
"outputs": formatted_doc.specific["outputs"],
"fn_name": formatted_doc.specific["fn_name"],
}
# This is a list of lists becauseevaluation_sample= [{"input_output": json.dumps(evaluation_sample)}]
metrics, _=codegen_metrics(
evaluation_sample,
generated_code_snippets,
k_list=[1], # Only run for Pass@1num_process_evaluate=64,
)
returnmetrics["pass@1"]
with no improvement in scoring time
Version info
Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.
I have tried metric_name="codegen_pass@1", it runs fast. But when I tried metric_name="codegen_pass@1:16", maybe bug? The warning is Sequence group 13_parallel_sample_10 is preempted by PreemptionMode.RECOMPUTE mode because there is not enough KV cache space. This can affect the end-to-end performance. Increase gpu_memory_utilization or tensor_parallel_size to provide more KV cache memory. total_num_cumulative_preemption=51 (scheduler.py:1754).
Maybe we should split 16 generations into more batch?
Describe the bug
Much slower scoring compared to livecodebench repo.
To Reproduce
Compare to LCB repo, scoring takes significantly longer and seems single threaded
Expected behavior
Match LCB scoring time with same number of threads set. I have tried setting num_process:
with no improvement in scoring time
Version info
Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.
latest master
The text was updated successfully, but these errors were encountered: