Experimental result on APPS

Hello.

I am reproducing your results, but I have a trouble time reproducing your baseline named `deepseek-coder-6.7b-instruct`.

I use your prompt provided in this repository, but the APPS introductory pass@1 is just 0.3192, where 0.4465 is right value according to your paper.

Also, I observe some difference between your paper and the paperwithcode website. The former reports 0.5001 while the latter reports 0.3380.

After this report, I think my score could be right if the base model score (i.e.,`deepseek-coder-6.7b-instruct`) is 0.3192 and the finetuned model score (i.e.`motcoder`) is 0.3380.

While writing this post, I found that website indicate your model as `motcoder-15b` which should be changed to `motcoder-6.7b` as your paper said that your base model is `deepseek-coder-6.7b-instruct`, right?

Could you clarify which score is right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental result on APPS #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experimental result on APPS #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions