Skip to content

Unable to replicate the results of Qwen2.5-Math-7B-Instruct #57

Open
@nurmanmus

Description

@nurmanmus

I've attempted several runs to replicate the results of Qwen/Qwen2.5-Math-7B-Instruct on College Math dataset but I'm getting ~41.8 which is too far off from the 46.8 as reported (despite using the same parameter values as used in your evaluation code). Can you please advise what I have missed. Here's my code:

!python3 -u evaluation/math_eval.py
--model_name_or_path Qwen2.5-Math/models/Qwen2.5-Math-7B-Instruct
--data_name "college_math"
--data_dir Qwen2.5-Math/evaluation/data
--output_dir Qwen2.5/Qwen2.5-Math/evaluation
--split test
--prompt_type "qwen25-math-cot"
--seed 0
--temperature 0
--n_sampling 1
--top_p 1
--start 0
--end -1
--use_vllm
--save_outputs
--overwrite

==================================================
data: college_math ,remain samples: 2818
{'idx': 0, 'data_source': 'college_math.Beginning_and_Intermediate_Algebra', 'question_number': 'exercise.0.4.61', 'question': 'Simplify: $-10-4(n-5)$', 'answer': '$10-4 n$', 'license': 'Creative Commons Attribution 3.0 Unported License (CC BY 3.0)', 'data_topic': 'college_math.algebra'}
0% 0/2818 [00:00<?, ?it/s]<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Simplify: $-10-4(n-5)$&lt;|im_end|>
<|im_start|>assistant

100% 2818/2818 [00:11<00:00, 254.93it/s]
-------------------- Epoch 0
Processed prompts: 100% 2818/2818 [05:31<00:00, 8.50it/s, est. speed input: 587.02 toks/s, output: 5583.72 toks/s]
-------------------- Epoch 1
Unsolved samples: 0
Evaluate: 0% 0/2818 [00:00<?, ?it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 3% 97/2818 [00:03<01:18, 34.80it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 9% 259/2818 [00:05<00:52, 48.74it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 10% 269/2818 [00:05<00:49, 51.28it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 34% 963/2818 [00:18<00:18, 97.97it/s] :1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 42% 1185/2818 [00:25<01:06, 24.73it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 55% 1549/2818 [00:38<02:09, 9.78it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 58% 1631/2818 [00:42<01:07, 17.56it/s]
Evaluate: 58% 1643/2818 [00:47<04:01, 4.86it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 65% 1831/2818 [01:02<01:16, 12.93it/s]:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'tuple' object is not callable; perhaps you missed a comma?
Evaluate: 97% 2721/2818 [02:16<00:22, 4.26it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 97% 2732/2818 [02:18<00:11, 7.43it/s]:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
:1: SyntaxWarning: 'set' object is not callable; perhaps you missed a comma?
Evaluate: 100% 2818/2818 [02:31<00:00, 18.62it/s]
{'num_samples': 2818, 'num_scores': 2818, 'timeout_samples': 1, 'empty_samples': 0, 'acc': 41.8}
Saved to Qwen2.5/Qwen2.5-Math/evaluation/college_math/test_qwen25-math-cot_-1_seed0_t0.0_s0_e-1.jsonl
college_math avg
41.8 41.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions