-
Notifications
You must be signed in to change notification settings - Fork 266
Open
Description
Hi, thank you for the great work on Skywork R1V2 — the results are impressive.
I was reading the paper and had a question regarding the training setup. Specifically, it's not entirely clear whether the language model (QwQ-32B) was kept frozen during the entire training process, including both the MPO and GRPO stages.
From Section 3.1 and Table 4, it seems like the adapter-only configuration yields the best performance, suggesting that the LLM might have been frozen. However, this isn't stated explicitly in the paper.
Could you kindly confirm:
Was the language model completely frozen throughout the entire training process?
Thanks again for sharing the model and for your contributions to the open-source community!
Metadata
Metadata
Assignees
Labels
No labels