GSPMD "2D Partitioning" Training Speed #1201
Unanswered
agemagician
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am testing the training speed using 2D Partitioning, and it gives me approximately 9.5 seconds per step, while on the paper, it should be approximately 6.5 giving a similar model size.
I am testing it with "scalable_t5" while using the following related parameters:
Paper "Table 2":
https://arxiv.org/pdf/2105.04663.pdf
@adarob Is it possible to share the configuration that achieved such a speed on TPU V3 ?
Beta Was this translation helpful? Give feedback.
All reactions