Skip to content

code for generate data for policy and PRM? #19

Open
@Xccanxin

Description

@Xccanxin

Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:

  1. Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [Hugging Face]

  2. Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [Hugging Face]

how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    about datasetdatasets of PRM and policy model

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions