Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Update pretrain_mamba.py
#1682 opened Jul 11, 2025 by vignesh1507 Loading…
[feat, moe] Add support for global aux loss
#1681 opened Jul 11, 2025 by Victarry Loading…
Support 1f1b a2a overlap
#1671 opened Jul 7, 2025 by lhb8125 Loading…
moe: remove unused variable scale_up
#1670 opened Jul 6, 2025 by WineChord Loading…
Speed up model parallel initialization
#1662 opened Jul 2, 2025 by alexqdh Loading…
Update README.md
#1660 opened Jul 2, 2025 by 21jun Loading…
Allow head_dim to override kv_channel calculation
#1655 opened Jun 27, 2025 by shuoyangd Loading…
bug fixed: wandb artifact requires the tracker file
#1654 opened Jun 27, 2025 by yezhengmao1 Loading…
Apply roll operation to position_ids in MTP
#1651 opened Jun 26, 2025 by iansheng Loading…
fix twice allgather in moe distrib optimizer
#1645 opened Jun 23, 2025 by irobot2013-why Loading…
Add Anton's Megatron changes for hyena compatibility
#1636 opened Jun 18, 2025 by jwilber Loading…
Fix log-timer-to-tensorboard on logging
#1631 opened Jun 13, 2025 by wplf Loading…
Set weights_only=False in optimizer
#1618 opened Jun 9, 2025 by zhic-mt Loading…
Fix mrope with context parallel
#1612 opened Jun 6, 2025 by liu-zichen Loading…
use a cpu set to cache cuda tensor finished_request_ids
#1610 opened Jun 5, 2025 by ladyrick Loading…
add node_rank argument for example scripts
#1604 opened May 30, 2025 by xylllllllll Loading…
CLIPViTModel support SP and CP
#1600 opened May 28, 2025 by Thaurun Loading…
Support Multiple Input Formats for checkpoint
#1599 opened May 28, 2025 by Thaurun Loading…
ProTip! Adding no:label will show everything without a label.