Understand torch profiler output for zero3 during fine-tuning of llama2 7B #4453

xwjiang2010 · 2023-10-04T23:59:54Z

xwjiang2010
Oct 4, 2023

Hi,
I am trying to understand the bottleneck of my fine-tuning job through torch profiler.
There should be 32 decoding layers for llama2 7B model.
I see somehow decoding layer 12 is seeing much longer time than other layers. Is there an explanation for it?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand torch profiler output for zero3 during fine-tuning of llama2 7B #4453

{{title}}

Replies: 0 comments

Select a reply

Understand torch profiler output for zero3 during fine-tuning of llama2 7B #4453

xwjiang2010 Oct 4, 2023

Replies: 0 comments

xwjiang2010
Oct 4, 2023