Inquiry About SFT Warm-Up Data Ratio and Its Impact on RL Performance

Dear Authors,

First, thank you for your excellent work! In the paper, you mentioned using a small amount of data for supervised fine-tuning (SFT) warm-up before the RL phase. Could you kindly clarify:

1. Data Scale: What is the specific data size used for the SFT warm-up phase?
2. Proportion to RL Data: What percentage does this warm-up data account for relative to the total data used in the RL training stage?
3. Impact Analysis: How does the ratio of SFT-to-RL data (whether higher or lower) affect the final RL evaluation metrics? Have you observed any notable patterns or thresholds in your experiments?

This clarification would greatly help readers understand the relationship between warm-up strategies and RL optimization.

Thank you for your time and insights!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry About SFT Warm-Up Data Ratio and Its Impact on RL Performance #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inquiry About SFT Warm-Up Data Ratio and Its Impact on RL Performance #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions