Skip to content

Inquiry About SFT Warm-Up Data Ratio and Its Impact on RL Performance #5

@starHu123

Description

@starHu123

Dear Authors,

First, thank you for your excellent work! In the paper, you mentioned using a small amount of data for supervised fine-tuning (SFT) warm-up before the RL phase. Could you kindly clarify:

  1. Data Scale: What is the specific data size used for the SFT warm-up phase?
  2. Proportion to RL Data: What percentage does this warm-up data account for relative to the total data used in the RL training stage?
  3. Impact Analysis: How does the ratio of SFT-to-RL data (whether higher or lower) affect the final RL evaluation metrics? Have you observed any notable patterns or thresholds in your experiments?

This clarification would greatly help readers understand the relationship between warm-up strategies and RL optimization.

Thank you for your time and insights!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions