Description
Hello.
I am reproducing your results, but I have a trouble time reproducing your baseline named deepseek-coder-6.7b-instruct
.
I use your prompt provided in this repository, but the APPS introductory pass@1 is just 0.3192, where 0.4465 is right value according to your paper.
Also, I observe some difference between your paper and the paperwithcode website. The former reports 0.5001 while the latter reports 0.3380.
After this report, I think my score could be right if the base model score (i.e.,deepseek-coder-6.7b-instruct
) is 0.3192 and the finetuned model score (i.e.motcoder
) is 0.3380.
While writing this post, I found that website indicate your model as motcoder-15b
which should be changed to motcoder-6.7b
as your paper said that your base model is deepseek-coder-6.7b-instruct
, right?
Could you clarify which score is right?