Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[DRAFT] JAX Current Scaling
#1647 opened Apr 5, 2025 by jberchtold-nvidia Draft
13 tasks
[PyTorch] Explicitly specify quantized tensor usages needed for linear op backward bug Something isn't working
#1646 opened Apr 4, 2025 by timmoon10 Loading…
7 of 13 tasks
Add experimental Shardy support.
#1642 opened Apr 3, 2025 by jreiffers Loading…
1 of 6 tasks
Enable fp8 primary weights for sub-channel recipe
#1641 opened Apr 3, 2025 by kunlunl Loading…
7 of 13 tasks
Add adam bf16 state with original fp32 kernel
#1640 opened Apr 3, 2025 by BestJuly Loading…
1 of 13 tasks
Fix cpp warnings
#1639 opened Apr 3, 2025 by yaox12 Loading…
13 tasks
Use internal quantizer in Linear module
#1638 opened Apr 3, 2025 by ptrendx Loading…
1 of 13 tasks
Symmetric memory all reduce
#1632 opened Apr 1, 2025 by wdykas Loading…
1 of 13 tasks
Support FP8 primary weight in FSDP training
#1630 opened Apr 1, 2025 by shjwudp Loading…
1 of 13 tasks
[PyTorch] Debug checkpointing with te.Sequential bug Something isn't working
#1629 opened Apr 1, 2025 by timmoon10 Loading…
8 of 13 tasks
Improved performance of mxfp8 cast kernels 2.2.0 performance Performance issues
#1628 opened Mar 31, 2025 by Oleg-Goncharov Loading…
6 of 13 tasks
[PyTorch][Common] Refactor RoPE 2.3.0
#1626 opened Mar 31, 2025 by yaox12 Loading…
2 of 13 tasks
Tongliu fp8 a2a
#1617 opened Mar 26, 2025 by Autumn1998 Draft
13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core
#1614 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 3 – tests
#1612 opened Mar 25, 2025 by pggPL Loading…
7 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 4 – documentation
#1611 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[PyTorch] Tutorial for the ONNX export
#1586 opened Mar 18, 2025 by pggPL Loading…
8 of 13 tasks
[JAX] Unbalanced Context Parallelism with THD format
#1565 opened Mar 12, 2025 by zlsh80826 Loading…
8 of 13 tasks
Draft: split wgrad for GroupedLinear
#1564 opened Mar 12, 2025 by lhb8125 Draft
13 tasks
[CI] Add isort
#1563 opened Mar 12, 2025 by yaox12 Draft
13 tasks
Enable AttnFuncWithCPAndKVP2P to support mla
#1561 opened Mar 12, 2025 by SuperCB Loading…
3 of 13 tasks
Blockwise scaling linear quantization recipe
#1559 opened Mar 11, 2025 by kwyss-nvidia Loading…
8 of 13 tasks
change softmax_lse correction of CP to FP32
#1546 opened Mar 7, 2025 by xrennvidia Loading…
6 of 13 tasks
Subchannel Block quantized GEMM
#1545 opened Mar 6, 2025 by kwyss-nvidia Loading…
7 of 12 tasks
ProTip! Mix and match filters to narrow down what you’re looking for.