Skip to content

is language model frozen for whole training? #28

@long8v

Description

@long8v

Hi, thank you for the great work on Skywork R1V2 — the results are impressive.

I was reading the paper and had a question regarding the training setup. Specifically, it's not entirely clear whether the language model (QwQ-32B) was kept frozen during the entire training process, including both the MPO and GRPO stages.

From Section 3.1 and Table 4, it seems like the adapter-only configuration yields the best performance, suggesting that the LLM might have been frozen. However, this isn't stated explicitly in the paper.

Could you kindly confirm:
Was the language model completely frozen throughout the entire training process?

Thanks again for sharing the model and for your contributions to the open-source community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions