Skip to content

Experimental result on APPS #4

Open
@sh0416

Description

@sh0416

Hello.

I am reproducing your results, but I have a trouble time reproducing your baseline named deepseek-coder-6.7b-instruct.

I use your prompt provided in this repository, but the APPS introductory pass@1 is just 0.3192, where 0.4465 is right value according to your paper.

Also, I observe some difference between your paper and the paperwithcode website. The former reports 0.5001 while the latter reports 0.3380.

After this report, I think my score could be right if the base model score (i.e.,deepseek-coder-6.7b-instruct) is 0.3192 and the finetuned model score (i.e.motcoder) is 0.3380.

While writing this post, I found that website indicate your model as motcoder-15b which should be changed to motcoder-6.7b as your paper said that your base model is deepseek-coder-6.7b-instruct, right?

Could you clarify which score is right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions