Description
I've attempted several runs to replicate the results of Qwen/Qwen2.5-Math-7B-Instruct on College Math dataset but I'm getting ~41.8 which is too far off from the 46.8 as reported (despite using the same parameter values as used in your evaluation code). Can you please advise what I have missed. Here's my code:
!python3 -u evaluation/math_eval.py
--model_name_or_path Qwen2.5-Math/models/Qwen2.5-Math-7B-Instruct
--data_name "college_math"
--data_dir Qwen2.5-Math/evaluation/data
--output_dir Qwen2.5/Qwen2.5-Math/evaluation
--split test
--prompt_type "qwen25-math-cot"
--seed 0
--temperature 0
--n_sampling 1
--top_p 1
--start 0
--end -1
--use_vllm
--save_outputs
--overwrite
==================================================
data: college_math ,remain samples: 2818
{'idx': 0, 'data_source': 'college_math.Beginning_and_Intermediate_Algebra', 'question_number': 'exercise.0.4.61', 'question': 'Simplify:
0% 0/2818 [00:00<?, ?it/s]<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Simplify:
<|im_start|>assistant
100% 2818/2818 [00:11<00:00, 254.93it/s]
-------------------- Epoch 0
Processed prompts: 100% 2818/2818 [05:31<00:00, 8.50it/s, est. speed input: 587.02 toks/s, output: 5583.72 toks/s]
-------------------- Epoch 1
Unsolved samples: 0
Evaluate: 0% 0/2818 [00:00<?, ?it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 3% 97/2818 [00:03<01:18, 34.80it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 9% 259/2818 [00:05<00:52, 48.74it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 10% 269/2818 [00:05<00:49, 51.28it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 34% 963/2818 [00:18<00:18, 97.97it/s] :1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 42% 1185/2818 [00:25<01:06, 24.73it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 55% 1549/2818 [00:38<02:09, 9.78it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 58% 1631/2818 [00:42<01:07, 17.56it/s]
Evaluate: 58% 1643/2818 [00:47<04:01, 4.86it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 65% 1831/2818 [01:02<01:16, 12.93it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 97% 2721/2818 [02:16<00:22, 4.26it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 97% 2732/2818 [02:18<00:11, 7.43it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 100% 2818/2818 [02:31<00:00, 18.62it/s]
{'num_samples': 2818, 'num_scores': 2818, 'timeout_samples': 1, 'empty_samples': 0, 'acc': 41.8}
Saved to Qwen2.5/Qwen2.5-Math/evaluation/college_math/test_qwen25-math-cot_-1_seed0_t0.0_s0_e-1.jsonl
college_math avg
41.8 41.8