code for generate data for policy and PRM?

Hi, thanks for your great work. But i am confused about the data-collecting code.
As you mentioned:
1. Download policy data (positive samples) for training 1st policy model (Llama3-8b-Instruct): [[Hugging Face](https://huggingface.co/datasets/zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st)]

2. Download PRM data (positive and negative samples) for training 1st reward model (Mistral-7B: MetaMATH): [[Hugging Face](https://huggingface.co/datasets/zd21/ReST-MCTS-Llama3-8b-Instruct-PRM-1st)]

how can i get these two data from your code? Is it from codes like self_train/generation/generate_both_samples_MATH.py or evaluate.py?
what is the key parameters to change to get these two data?
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

code for generate data for policy and PRM? #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

code for generate data for policy and PRM? #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions