https://github.com/AI-Hypercomputer/maxtext/blob/db89bbb818e7dc98ee9d3ad14db0a24f436f0c55/MaxText/train.py#L468 moe_lb_loss should be divided by gradient_accumulation_steps for reporting. ```py moe_lb_loss = aux["moe_lb_loss"] / config.gradient_accumulation_steps ```