Release Thunder 0.2.6 PyTorch Conference Edition · Lightning-AI/lightning-thunder

What's Changed

bump release to dev version post 0.2.5 by @t-vi in #2513
torch.cumsum api change by @jjsjann123 in #2507
DTensor: support linear by @kshitij12345 in #2422
TEv2 as default TE executor by @riccardofelluga in #2510
Create Symbol with is_prim=True in _register_custom_op by @crcrpar in #2516
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2515
Use underlying class's new in MutableMappingWrapper.new by @t-vi in #2514
fix: working README.md example, support nvfuser for torch==2.8 by @lianakoleva in #2525
Add KaelanDt as codeowner by @t-vi in #2540
add ci skips by @t-vi in #2547
TE: Fix cudnn.h not found by @kshitij12345 in #2536
feat: _register_custom_op supports List[torch.Tensor] by @lianakoleva in #2529
feat: provide guidance when registering custom op by @lianakoleva in #2530
fix: string -> f-string where intended by @lianakoleva in #2528
Initial TE NVFP4 recipe support by @riccardofelluga in #2523
Add DTensor prim and torch symbol for exp by @kshitij12345 in #2496
[DTensor] Add prim and torch sym for neg and reciprocal by @kshitij12345 in #2552
Bump pytest from 8.3.5 to 8.4.2 by @dependabot[bot] in #2567
Bump bitsandbytes from 0.47.0 to 0.48.0 by @dependabot[bot] in #2565
Bump diffusers from 0.34.0 to 0.35.1 by @dependabot[bot] in #2564
switch CI to non-interruptible by @t-vi in #2554
tests: clean up xfails in VJP tests by @aobolensk in #2578
fix output dtype for nvfuserex cumsum by @jjsjann123 in #2580
Use signature (*args, **kwargs) when signature is unavailable by @shino16 in #2542
fix: call torch.cuda.is_available() in available_devices by @aobolensk in #2602
Add uint64 to thunder->torch dtype map by @crcrpar in #2519
Enable direct bindings in Thunder by @rdspring1 in #2502
move to nvfuser-cu128-torch28 by @t-vi in #2604
[DTensor] Add torch symbol and prim for _grouped_mm by @kshitij12345 in #2503
[DTensor] Add prim and torch symbol for add by @kshitij12345 in #2581
MoE TensorParallel with Eager by @kshitij12345 in #2582
fix: missing 'import torch' in README.md by @aobolensk in #2608
Remove E741 from ruff lint ignore rules by @tpremrud in #2601
Disallow custom_op that mutates arguments by @crcrpar in #2603
Disabled TF32 on Amper+ devices to stabilize numeric accuracy by @mattteochen in #2579
Propagate rounding_mode in div_ by @beverlylytle in #2614
Refactor quantization.py to use TSP by @tejapulagam in #2522
[DTensor] Add test with parallelize_module by @kshitij12345 in #2598
Inference benchmark of "meta-llama/Llama-4-Maverick-17B-128E" by @crcrpar in #2487
Remove outdated scenarios from inference benchmark by @crcrpar in #2619
Add float4_e2m1fn_x2 to lcdtype_to_nvdtype_map by @crcrpar in #2532
avoid torch.float4_e2m1fn_x2 in _get_min_and_val by @crcrpar in #2533
[benchmark_inference] Fix replacing the MoE by @kshitij12345 in #2620
Fix FSDP NB by @kshitij12345 in #2629
Fixes function name is not defined when using DebugTransform by @kiya00 in #2617
Enable MoE TP with thunderfx by @kshitij12345 in #2611
Add DIV_EXACT prim by @beverlylytle in #2626
try getting version from nvfuser_direct first by @crcrpar in #2623
Fix hf example and benchmark run on CPU by @aobolensk in #2583
getattr should always be taken from the class and then bound by @t-vi in #2584
Update benchmark_inference.py to support TP with thunderfx by @kshitij12345 in #2625
Add TE's NVFP4 recipe to the test suite by @riccardofelluga in #2612
Jj/cumsum nvfuserex opinfo tolerance by @jjsjann123 in #2586
tests: Extend testing for dunder and binary elementwise operations by @aobolensk in #2597
Propagate disable_torch_autograd to thunderfx's _splitter by @crcrpar in #2534
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2521
[benchmark_inference] Enable tqdm only on rank0 by @crcrpar in #2630
fix tests for CI by @t-vi in #2627
Remove cuda checks from TE and Triton xentropy executors by @riccardofelluga in #2613
Fixes backward issue where silu outputs nan by @kiya00 in #2624
[benchmark_inference] Update from_linear and from_grouped_linear to accept fqn: str by @crcrpar in #2631
Have seed and offset in int64 for cudnn-frontend SDPA by @crcrpar in #2520
Refactor low precision option handling in benchmark_litgpt.py by @riccardofelluga in #2615
Bump the gha-updates group with 3 updates by @dependabot[bot] in #2568
Fix import of TE Recipe by @ksivaman in #2635
fix benchmarks job by @t-vi in #2607
Add profile transform by @t-vi in #2636
Enable interpolate tests, add PrimID mapping for ceil and floor by @aobolensk in #2609
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2637
Update jvp computation in test_grad.py by @mattteochen in #2618
Propagated requires_grad to torch tensor by @mattteochen in #2616
add mask lookaside into transformers recipe by @t-vi in #2639
Fix thunderjit in inference benchmark by @t-vi in #2644
Warm up sufficiently by @wujingyue in #2638
Reset peak memory stats before measurement by @wujingyue in #2647
drop extra executor by @t-vi in #2648
Relax Test Tolerances for TE executor tests by @riccardofelluga in #2646
test_vjp_correctness_sdpa_manual: relax test tolerance (#2576) by @kiya00 in #2628
[benchmark_inference] Decrease max_new_tokens for warm-up by @kshitij12345 in #2649
[benchmark_inference] Reshape the output from run_routed_experts by @kshitij12345 in #2650
Revert "[benchmark_inference] Decrease max_new_tokens for warm-up" by @wujingyue in #2660
Fix the kv cache input STATIC_MEMORY_LOCATION tag in QuickStart example by @kiya00 in #2667
be more thorough in replacing thunder.Device with torch.device in epilogue by @t-vi in #2669
TE inference executor for 8 bit by @t-vi in #2632
Store GroupedLinear's weight in GNK layout by @wujingyue in #2659
[DTensor] Skip exp test on nvfuser by @kshitij12345 in #2671
Add an option to profile only non-warmup iterations by @wujingyue in #2661
Revert new primitive for grad bug fix; Apply localized solution for division output type consistency in _div_prim_grad by @Copilot in #2665
Remove stray print by @kshitij12345 in #2673
Remove --dtensor-single-gpu by @wujingyue in #2666
reflect GroupedLinear's changed weight by @t-vi in #2674
Use torch._inductor.compile for ThunderFX fallback entrypoint by @shino16 in #2600
Tom/readme by @t-vi in #2684
Add repr function for CACHE_OPTIONS and SHARP_EDGES_OPTIONS by @kiya00 in #2676
add profile plugin by @t-vi in #2683
bump version for release by @t-vi in #2685

New Contributors

@aobolensk made their first contribution in #2578
@tpremrud made their first contribution in #2601
@mattteochen made their first contribution in #2579

Full Changelog: 0.2.5...0.2.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thunder 0.2.6 PyTorch Conference Edition

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!