Open
Description
I've seen the term Reinforcement Fine Tuning (RFT)
being used and would like to clearly understand what this term generally means.
I'm also curious about its relationship to, or specific differences from, the SFT or GRPO-based RL fine-tuning processes that are often mentioned.
I would appreciate it if you could clarify the meaning of the term RFT and whether it refers to a separate fine-tuning process or is related to the aforementioned SFT and GRPO-based RL methods.
Metadata
Metadata
Assignees
Labels
No labels