-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Issue 1672 fix: initializing the current pointed with int64 to avoid …
#1673
opened Jul 7, 2025 by
sharanmayank
Loading…
bug fixed: wandb artifact requires the tracker file
#1654
opened Jun 27, 2025 by
yezhengmao1
Loading…
add fused_topk_softmax_without_capacity for topk router fusion
#1637
opened Jun 18, 2025 by
AshOfCat
Loading…
Fix typos: vritual → virtual and decoeder → decoder
#1626
opened Jun 11, 2025 by
EricLabile
Loading…
Fix: Apply q_layernorm consistently in MLA LoRA path
#1624
opened Jun 11, 2025 by
Flink-ddd
Loading…
fix: when using moe parallel folding feature and set etp > 1 && ep == 1, the grad sync is incorrect and the loss curve is bad
#1622
opened Jun 10, 2025 by
Louis-J
Loading…
use a cpu set to cache cuda tensor
finished_request_ids
#1610
opened Jun 5, 2025 by
ladyrick
Loading…
Add DistTrain, Allow Encoder to Have Different DP Size
#1605
opened May 30, 2025 by
zidanehuang001
Loading…
bugfix: cross_entropy inplace operations may cause backward error
#1594
opened May 24, 2025 by
ChangWeiming
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.