is language model frozen for whole training?

Hi, thank you for the great work on Skywork R1V2 — the results are impressive.

I was reading the paper and had a question regarding the training setup. Specifically, it's not entirely clear whether the language model (QwQ-32B) was kept frozen during the entire training process, including both the MPO and GRPO stages.

From Section 3.1 and Table 4, it seems like the adapter-only configuration yields the best performance, suggesting that the LLM might have been frozen. However, this isn't stated explicitly in the paper.

Could you kindly confirm:
Was the language model completely frozen throughout the entire training process?

Thanks again for sharing the model and for your contributions to the open-source community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

is language model frozen for whole training? #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

is language model frozen for whole training? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions