Thanks for your great work. May I ask if reasoning is trained during the SFT phase rather than the RL phase?Thanks!