Skip to content
Discussion options

You must be logged in to vote

We consider LPLB beneficial for both the training and inference prefilling stages. The optimal batch size threshold depends on several factors, including your model size, expert load balancing (which is influenced by data distribution), and infrastructure efficiency.

We recommend first estimating the potential gains from improved load balance. It's worth trying this approach only if those gains significantly outweigh the ~100–200 µs overhead. Also, please note that the additional memory access introduced by redundant experts may reduce the actual performance improvement compared to theoretical expectations.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Elevator14B
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #3 on November 28, 2025 10:04.