Is there any difference between this and the knowledge distillation of GPT-4o ?

reasoning by GPT-4o + Rule-based Reward + GRPO **=** reasoning by GPT-4o + SFT

This is not a real RL; it is supervised learning. Just like the image classification, and the reward is the $1\{\hat{y}==y^*\}$. It also can employ RL optimization (like PPO or GRPO) to learn the supervised model, but it is not a real RL.