Understand torch profiler output for zero3 during fine-tuning of llama2 7B #4453
Unanswered
xwjiang2010
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am trying to understand the bottleneck of my fine-tuning job through torch profiler.
There should be 32 decoding layers for llama2 7B model.
I see somehow decoding layer 12 is seeing much longer time than other layers. Is there an explanation for it?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions