<img width="1868" height="228" alt="Image" src="https://github.com/user-attachments/assets/05766fb7-4c95-4d43-bcc9-40cf04593ed6" /> How should I set the router bias and the auxiliary sequence-level balance loss during training?