GitHub · Where software is built

Labels Milestones New issue

can we [add new feature ]support zero=2,3 with --tensor-model-parallel-size 2 --pipeline-model-parallel-size 2 for pretrain-gpt2?

#407

· SeekPoint opened

on Apr 28, 2025

BUG: training gpt2 with pp=2 error:list index out of range

#406

· 9LLPPLL6 opened

on Apr 18, 2025

How can I set recomputation-granularity,like selective or full?

#403

· LordEdison opened

on Apr 30, 2024

Hello, what version of the megatron-lm library is your code modified?

#401

· 4thGardenOfQMH opened

on Feb 26, 2024

Is this assertion for mask wrong?

#400

· yinfangchen opened

on Feb 15, 2024

Hello, can Megatron-DeepSpeed pre-train llama2?

#398

· 13416157913 opened

on Oct 12, 2023

the traing log like this is Normal？ I do not find loss in the logs, and what does the "grad norm: nan" mean?

#396

· alphanlp opened

on Aug 27, 2023

The difference between zero-3 and megatron with zero-2

#395

· nicosouth opened

on Aug 25, 2023

Question about the implementation of mpu.cross_entropy when using tensor parallel

#394

· robin087 opened

on Aug 3, 2023

questions about inconsistent evaluation result

#392

· coorful opened

on Jul 24, 2023

Question about ds to universal

#388

· saxh opened

on May 31, 2023

RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'

#387

· zll0000 opened

on May 26, 2023