Skip to content

What dataset is D_G for policy SFT in 0-th iteration and further MCTS* ? #26

Open
@RewindL

Description

@RewindL

Thanks for your remarkable work. I see the Policy data collected from 1st/2nd iteration MCTS*. I have a question that what is the initial question set $D_G$ you use to SFT initial policy model $\pi_{S_0}$ in Algorithm.1 and run MCTS* in iterations?

I guess user-costumed given $D_G$ works too, but I wish dataset $D_G$ can be released just like $D_{V_0}$. Do I miss something ? $D_G$ is also construct from SciInstruct, or use same train set corresponding to test set (e.g. use math-train as $D_G$ when evaluating on math-test) ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions