Fetch from nvidia Megatron-LM #5

RaymondLi0 · 2022-08-03T20:32:25Z

No description provided.

…tion

feat(MoE): FP8 Support for Multi-Token-Prediction See merge request ADLR/megatron-lm!2950

…ly test

Fix checkpoint directory bug in distill nightly test Closes #446 See merge request ADLR/megatron-lm!3096

…onflicts

[dist ckpt] Re-attempt !2493 + fixing merge conflicts See merge request ADLR/megatron-lm!2637

ci: Control which checks per test to run See merge request ADLR/megatron-lm!3175

Fix the sync issue in `TemporalAsyncWorker` See merge request ADLR/megatron-lm!3155

Co-authored-by: Chenhan Yu <[email protected]> Co-authored-by: Chen-Han Yu <[email protected]> Co-authored-by: Ye Yu <[email protected]>

Add ModelOpt speculative decoding finetune See merge request ADLR/megatron-lm!2971

Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

Moe fix for Llama4 See merge request ADLR/megatron-lm!3083

…DeepSeek-v3 Co-authored-by: jianbinc <[email protected]>

[custom FSDP] Support EP + FSDP training for DeepSeek-v3 See merge request ADLR/megatron-lm!2910

Fix extra tokens in returned generation Closes dl/JoC/nemo-ci#2075 See merge request ADLR/megatron-lm!3178

…o 2.2.0.dev0

Update current scaling supported TE version to 2.2.0.dev0 See merge request ADLR/megatron-lm!3160

Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Vijay Korthikanti <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Seperate chunk allocator See merge request ADLR/megatron-lm!3121

…inference_context.sequence_len_offset > 0

Revert inference_context.is_decode_only() to inference_context.sequence_len_offset > 0 See merge request ADLR/megatron-lm!3180

…-fusion will throw an exception when topk/num_local_experts is not the power of 2.

[BUG FIX]: fix the bug of indices-to-multihot-fusion will throw an exception when topk/num_local_experts is not the power of 2. See merge request ADLR/megatron-lm!3058

…g global ones with optional local ones for better parallelism flexibility Co-authored-by: Zhiyu Li <[email protected]>

Refactor Inference Process Groups by replacing global ones with optional local ones for better parallelism flexibility See merge request ADLR/megatron-lm!3015

Update te patch to include 1626 See merge request ADLR/megatron-lm!3179

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Update dataset helper for online video decoding See merge request ADLR/megatron-lm!3367

Do not use eval on arbitrary user input. See merge request ADLR/megatron-lm!3365

tests: Update frozen-checkpoints See merge request ADLR/megatron-lm!3363

…eration Co-authored-by: root <[email protected]>

Consolidate eval methods across train and generation See merge request ADLR/megatron-lm!3375

ci: Auto-restart on nan See merge request ADLR/megatron-lm!3388

…YARN embedding cache Co-authored-by: xuwenc <[email protected]>

perf(mla, experimental): MLA RoPE fusion and YARN embedding cache Closes #429 See merge request ADLR/megatron-lm!2949

Co-authored-by: jianbinc <[email protected]>

Fix custom FSDP float8 tensor set_item See merge request ADLR/megatron-lm!3280

ci: Move queue blocker See merge request ADLR/megatron-lm!3401

Co-authored-by: Mcore Bot <[email protected]>

ci: Improve error-handling of missing logs See merge request ADLR/megatron-lm!3400

Co-authored-by: Mcore Bot <[email protected]>

ci: Control job concurrency See merge request ADLR/megatron-lm!3408

ci: Catch missing logs See merge request ADLR/megatron-lm!3412

ci: Remove tests from A100 See merge request ADLR/megatron-lm!3411

…of ChainedOptimizer

Add an option to skip counting zeros in grad of ChainedOptimizer See merge request ADLR/megatron-lm!3393

…groups

Add an interface to set high priority stream groups See merge request ADLR/megatron-lm!3326

Co-authored-by: Chen-Han Yu <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

Llama4 inference See merge request ADLR/megatron-lm!3241

BestJuly and others added 30 commits April 24, 2025 19:53

ADLR/megatron-lm!2950 - feat(MoE): FP8 Support for Multi-Token-Predic…

20c635b

…tion

Merge branch 'lit/deepseekv3_fp8' into 'main'

51903b2

feat(MoE): FP8 Support for Multi-Token-Prediction See merge request ADLR/megatron-lm!2950

ADLR/megatron-lm!3096 - Fix checkpoint directory bug in distill night…

3d1ecd7

…ly test

Merge branch 'aanoosheh/fix-ckpt-dir-bug' into 'main'

65aa136

Fix checkpoint directory bug in distill nightly test Closes #446 See merge request ADLR/megatron-lm!3096

ADLR/megatron-lm!2637 - [dist ckpt] Re-attempt !2493 + fixing merge c…

f7fdafd

…onflicts

Merge branch 'intra-parallel-2493' into 'main'

8f6e830

[dist ckpt] Re-attempt !2493 + fixing merge conflicts See merge request ADLR/megatron-lm!2637

ADLR/megatron-lm!3175 - ci: Control which checks per test to run

99f43ae

Merge branch 'ko3n1g/ci/metrics-per-test' into 'main'

2bb62e7

ci: Control which checks per test to run See merge request ADLR/megatron-lm!3175

ADLR/megatron-lm!3155 - Fix the sync issue in TemporalAsyncWorker

48cc46f

Merge branch 'sbak/ckpt_sync_issue' into 'main'

222adb8

Fix the sync issue in `TemporalAsyncWorker` See merge request ADLR/megatron-lm!3155

ADLR/megatron-lm!2971 - Add ModelOpt speculative decoding finetune

c8f6279

Co-authored-by: Chenhan Yu <[email protected]> Co-authored-by: Chen-Han Yu <[email protected]> Co-authored-by: Ye Yu <[email protected]>

Merge branch 'yeyu/finetune' into 'main'

154a7a8

Add ModelOpt speculative decoding finetune See merge request ADLR/megatron-lm!2971

ADLR/megatron-lm!3083 - Moe fix for Llama4

4ca4309

Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

Merge branch 'yuya/moe_sigmoid_fix' into 'main'

aab56ce

Moe fix for Llama4 See merge request ADLR/megatron-lm!3083

ADLR/megatron-lm!2910 - [custom FSDP] Support EP + FSDP training for …

5fe1eeb

…DeepSeek-v3 Co-authored-by: jianbinc <[email protected]>

Merge branch 'custom_fsdp_dsv3' into 'main'

f7a25e5

[custom FSDP] Support EP + FSDP training for DeepSeek-v3 See merge request ADLR/megatron-lm!2910

ADLR/megatron-lm!3178 - Fix extra tokens in returned generation

a1843ac

Merge branch 'helenn-fix-seqlen-chopping' into 'main'

ceed1b7

Fix extra tokens in returned generation Closes dl/JoC/nemo-ci#2075 See merge request ADLR/megatron-lm!3178

ADLR/megatron-lm!3160 - Update current scaling supported TE version t…

b764f2d

…o 2.2.0.dev0

Merge branch 'donghyukc/te_min_version' into 'main'

57d21c3

Update current scaling supported TE version to 2.2.0.dev0 See merge request ADLR/megatron-lm!3160

Merge branch 'seperate_chunk_allocator' into 'main'

e733d7d

Seperate chunk allocator See merge request ADLR/megatron-lm!3121

ADLR/megatron-lm!3180 - Revert inference_context.is_decode_only() to …

4f16de3

…inference_context.sequence_len_offset > 0

Merge branch 'helenn-fix-seqlenoffset' into 'main'

33a193d

Revert inference_context.is_decode_only() to inference_context.sequence_len_offset > 0 See merge request ADLR/megatron-lm!3180

ADLR/megatron-lm!3058 - [BUG FIX]: fix the bug of indices-to-multihot…

bc70535

…-fusion will throw an exception when topk/num_local_experts is not the power of 2.

Merge branch 'incidices_to_multihot' into 'main'

885a245

[BUG FIX]: fix the bug of indices-to-multihot-fusion will throw an exception when topk/num_local_experts is not the power of 2. See merge request ADLR/megatron-lm!3058

ADLR/megatron-lm!3015 - Refactor Inference Process Groups by replacin…

8208937

…g global ones with optional local ones for better parallelism flexibility Co-authored-by: Zhiyu Li <[email protected]>

Merge branch 'zhiyul/orthotope/inference' into 'main'

7118d88

Refactor Inference Process Groups by replacing global ones with optional local ones for better parallelism flexibility See merge request ADLR/megatron-lm!3015

ADLR/megatron-lm!3179 - Update te patch to include 1626

9bb34bf

Merge branch 'donghyukc/te_patch_update' into 'main'

2f4463e

Update te patch to include 1626 See merge request ADLR/megatron-lm!3179

Matthieu Le and others added 30 commits May 29, 2025 02:34

ADLR/megatron-lm!3367 - Update dataset helper for online video decoding

90e768c

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'matthieul/fix_text_generation' into 'main'

705d312

Update dataset helper for online video decoding See merge request ADLR/megatron-lm!3367

ADLR/megatron-lm!3365 - Do not use eval on arbitrary user input.

7c1baea

Merge branch 'safer-eval' into 'main'

c820c68

Do not use eval on arbitrary user input. See merge request ADLR/megatron-lm!3365

ADLR/megatron-lm!3363 - tests: Update frozen-checkpoints

c6b08c2

Merge branch 'ko3n1g/tests/frozen-cpkt' into 'main'

8a39761

tests: Update frozen-checkpoints See merge request ADLR/megatron-lm!3363

ADLR/megatron-lm!3375 - Consolidate eval methods across train and gen…

8d08685

…eration Co-authored-by: root <[email protected]>

Merge branch 'matthieul/consolidate_eval' into 'main'

13898cb

Consolidate eval methods across train and generation See merge request ADLR/megatron-lm!3375

ADLR/megatron-lm!3388 - ci: Auto-restart on nan

de245df

Merge branch 'ko3n1g/ci/restart-on-nan' into 'main'

0a438ed

ci: Auto-restart on nan See merge request ADLR/megatron-lm!3388

ADLR/megatron-lm!2949 - perf(mla, experimental): MLA RoPE fusion and …

23e6471

…YARN embedding cache Co-authored-by: xuwenc <[email protected]>

Merge branch 'hongxiaob/mla_rope' into 'main'

9c1a535

perf(mla, experimental): MLA RoPE fusion and YARN embedding cache Closes #429 See merge request ADLR/megatron-lm!2949

ADLR/megatron-lm!3280 - Fix custom FSDP float8 tensor set_item

da3f0ff

Co-authored-by: jianbinc <[email protected]>

Merge branch 'fix_cfsdp_fp8_param_load' into 'main'

549d637

Fix custom FSDP float8 tensor set_item See merge request ADLR/megatron-lm!3280

ADLR/megatron-lm!3401 - ci: Move queue blocker

24c60db

Merge branch 'ko3n1g/ci/move-queue-blocker' into 'main'

cfea2ea

ci: Move queue blocker See merge request ADLR/megatron-lm!3401

ADLR/megatron-lm!3400 - ci: Improve error-handling of missing logs

37b0afd

Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'ko3n1g/ci/better-log-failure-handling' into 'main'

6a62a54

ci: Improve error-handling of missing logs See merge request ADLR/megatron-lm!3400

ADLR/megatron-lm!3408 - ci: Control job concurrency

4648912

Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'ko3n1g/ci/job-concurrency' into 'main'

cde60ce

ci: Control job concurrency See merge request ADLR/megatron-lm!3408

ADLR/megatron-lm!3412 - ci: Catch missing logs

eab047c

Merge branch 'ko3n1g/ci/fix-no-log' into 'main'

25a26ca

ci: Catch missing logs See merge request ADLR/megatron-lm!3412

ADLR/megatron-lm!3411 - ci: Remove tests from A100

9bdfe31

Merge branch 'ko3n1g/ci/move-tests' into 'main'

ff64f96

ci: Remove tests from A100 See merge request ADLR/megatron-lm!3411

ADLR/megatron-lm!3393 - Add an option to skip counting zeros in grad …

d960800

…of ChainedOptimizer

Merge branch 'no_count_zeros' into 'main'

b47a9bb

Add an option to skip counting zeros in grad of ChainedOptimizer See merge request ADLR/megatron-lm!3393

ADLR/megatron-lm!3326 - Add an interface to set high priority stream …

bc80491

…groups

Merge branch 'comm-priority-setting' into 'main'

957f348

Add an interface to set high priority stream groups See merge request ADLR/megatron-lm!3326

ADLR/megatron-lm!3241 - Llama4 inference

7af72f9

Co-authored-by: Chen-Han Yu <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

Merge branch 'llama4-inference' into 'main'

4eb36f8

Llama4 inference See merge request ADLR/megatron-lm!3241

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fetch from nvidia Megatron-LM #5

Fetch from nvidia Megatron-LM #5

Uh oh!

RaymondLi0 commented Aug 3, 2022

Uh oh!

Uh oh!

Fetch from nvidia Megatron-LM #5

Are you sure you want to change the base?

Fetch from nvidia Megatron-LM #5

Uh oh!

Conversation

RaymondLi0 commented Aug 3, 2022

Uh oh!

Uh oh!