Meaning of Reinforcement Fine Tuning (RFT) and its Relationship with SFT/RL

I've seen the term `Reinforcement Fine Tuning (RFT)` being used and would like to clearly understand what this term generally means. 

I'm also curious about its relationship to, or specific differences from, the SFT  or GRPO-based RL fine-tuning processes that are often mentioned.

I would appreciate it if you could clarify the meaning of the term RFT and whether it refers to a separate fine-tuning process or is related to the aforementioned SFT and GRPO-based RL methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meaning of Reinforcement Fine Tuning (RFT) and its Relationship with SFT/RL #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Meaning of Reinforcement Fine Tuning (RFT) and its Relationship with SFT/RL #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions