What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ?

Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set $D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?
  
I guess user-costumed given $D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$. Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions