Releases · Lightning-AI/lightning-thunder

22 Oct 04:53

t-vi

0.2.6

396dea1

Thunder 0.2.6 PyTorch Conference Edition Latest

Latest

What's Changed

bump release to dev version post 0.2.5 by @t-vi in #2513
torch.cumsum api change by @jjsjann123 in #2507
DTensor: support linear by @kshitij12345 in #2422
TEv2 as default TE executor by @riccardofelluga in #2510
Create Symbol with is_prim=True in _register_custom_op by @crcrpar in #2516
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2515
Use underlying class's new in MutableMappingWrapper.new by @t-vi in #2514
fix: working README.md example, support nvfuser for torch==2.8 by @lianakoleva in #2525
Add KaelanDt as codeowner by @t-vi in #2540
add ci skips by @t-vi in #2547
TE: Fix cudnn.h not found by @kshitij12345 in #2536
feat: _register_custom_op supports List[torch.Tensor] by @lianakoleva in #2529
feat: provide guidance when registering custom op by @lianakoleva in #2530
fix: string -> f-string where intended by @lianakoleva in #2528
Initial TE NVFP4 recipe support by @riccardofelluga in #2523
Add DTensor prim and torch symbol for exp by @kshitij12345 in #2496
[DTensor] Add prim and torch sym for neg and reciprocal by @kshitij12345 in #2552
Bump pytest from 8.3.5 to 8.4.2 by @dependabot[bot] in #2567
Bump bitsandbytes from 0.47.0 to 0.48.0 by @dependabot[bot] in #2565
Bump diffusers from 0.34.0 to 0.35.1 by @dependabot[bot] in #2564
switch CI to non-interruptible by @t-vi in #2554
tests: clean up xfails in VJP tests by @aobolensk in #2578
fix output dtype for nvfuserex cumsum by @jjsjann123 in #2580
Use signature (*args, **kwargs) when signature is unavailable by @shino16 in #2542
fix: call torch.cuda.is_available() in available_devices by @aobolensk in #2602
Add uint64 to thunder->torch dtype map by @crcrpar in #2519
Enable direct bindings in Thunder by @rdspring1 in #2502
move to nvfuser-cu128-torch28 by @t-vi in #2604
[DTensor] Add torch symbol and prim for _grouped_mm by @kshitij12345 in #2503
[DTensor] Add prim and torch symbol for add by @kshitij12345 in #2581
MoE TensorParallel with Eager by @kshitij12345 in #2582
fix: missing 'import torch' in README.md by @aobolensk in #2608
Remove E741 from ruff lint ignore rules by @tpremrud in #2601
Disallow custom_op that mutates arguments by @crcrpar in #2603
Disabled TF32 on Amper+ devices to stabilize numeric accuracy by @mattteochen in #2579
Propagate rounding_mode in div_ by @beverlylytle in #2614
Refactor quantization.py to use TSP by @tejapulagam in #2522
[DTensor] Add test with parallelize_module by @kshitij12345 in #2598
Inference benchmark of "meta-llama/Llama-4-Maverick-17B-128E" by @crcrpar in #2487
Remove outdated scenarios from inference benchmark by @crcrpar in #2619
Add float4_e2m1fn_x2 to lcdtype_to_nvdtype_map by @crcrpar in #2532
avoid torch.float4_e2m1fn_x2 in _get_min_and_val by @crcrpar in #2533
[benchmark_inference] Fix replacing the MoE by @kshitij12345 in #2620
Fix FSDP NB by @kshitij12345 in #2629
Fixes function name is not defined when using DebugTransform by @kiya00 in #2617
Enable MoE TP with thunderfx by @kshitij12345 in #2611
Add DIV_EXACT prim by @beverlylytle in #2626
try getting version from nvfuser_direct first by @crcrpar in #2623
Fix hf example and benchmark run on CPU by @aobolensk in #2583
getattr should always be taken from the class and then bound by @t-vi in #2584
Update benchmark_inference.py to support TP with thunderfx by @kshitij12345 in #2625
Add TE's NVFP4 recipe to the test suite by @riccardofelluga in #2612
Jj/cumsum nvfuserex opinfo tolerance by @jjsjann123 in #2586
tests: Extend testing for dunder and binary elementwise operations by @aobolensk in #2597
Propagate disable_torch_autograd to thunderfx's _splitter by @crcrpar in #2534
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2521
[benchmark_inference] Enable tqdm only on rank0 by @crcrpar in #2630
fix tests for CI by @t-vi in #2627
Remove cuda checks from TE and Triton xentropy executors by @riccardofelluga in #2613
Fixes backward issue where silu outputs nan by @kiya00 in #2624
[benchmark_inference] Update from_linear and from_grouped_linear to accept fqn: str by @crcrpar in #2631
Have seed and offset in int64 for cudnn-frontend SDPA by @crcrpar in #2520
Refactor low precision option handling in benchmark_litgpt.py by @riccardofelluga in #2615
Bump the gha-updates group with 3 updates by @dependabot[bot] in #2568
Fix import of TE Recipe by @ksivaman in #2635
fix benchmarks job by @t-vi in #2607
Add profile transform by @t-vi in #2636
Enable interpolate tests, add PrimID mapping for ceil and floor by @aobolensk in #2609
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2637
Update jvp computation in test_grad.py by @mattteochen in #2618
Propagated requires_grad to torch tensor by @mattteochen in #2616
add mask lookaside into transformers recipe by @t-vi in #2639
Fix thunderjit in inference benchmark by @t-vi in #2644
Warm up sufficiently by @wujingyue in #2638
Reset peak memory stats before measurement by @wujingyue in #2647
drop extra executor by @t-vi in #2648
Relax Test Tolerances for TE executor tests by @riccardofelluga in #2646
test_vjp_correctness_sdpa_manual: relax test tolerance (#2576) by @kiya00 in #2628
[benchmark_inference] Decrease max_new_tokens for warm-up by @kshitij12345 in #2649
[benchmark_inference] Reshape the output from run_routed_experts by @kshitij12345 in #2650
Revert "[benchmark_inference] Decrease max_new_tokens for warm-up" by @wujingyue ...

Contributors

wujingyue, rdspring1, and 16 other contributors

Assets 2

10 Sep 12:27

t-vi

0.2.5

646e965

Thunder 0.2.5 - Summer Harvest

What's Changed

bump version to 0.2.5.dev0 by @t-vi in #2274
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2271
Add support for torch.argsort by @protonu in #2246
Add test for Phi-3-vision-128k-instruct by @kshitij12345 in #1850
DTensor: NVFuser Integration by @kshitij12345 in #2177
fix lint from merge by @t-vi in #2277
Remove F842 from ignore rules by @crcrpar in #2270
E2E Coverage Test for Thunder by @tejapulagam in #2086
convert pow gradient to new style by @t-vi in #2283
add Windows xfail/skipif to tests not working on windows by @t-vi in #2284
fix activation checkpointing in the joint trace by @beverlylytle in #2203
Remove F811 from ignore rules by @crcrpar in #2268
fix lint by @t-vi in #2286
Return saved-for-backward objects as tuples by @beverlylytle in #2279
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2287
Update coverage requirement from ~=7.8.2 to ~=7.9.1 by @dependabot[bot] in #2292
Add "transformer_engine_v2" to expected all executors set by @crcrpar in #2226
Fix CI benchmarks by @KaelanDt in #2303
Bump pytest-random-order from 1.1.1 to 1.2.0 by @dependabot[bot] in #2289
prune redundant ifs in ci workflows by @Borda in #2296
Bump pytest-cov from 6.1.1 to 6.2.1 by @dependabot[bot] in #2291
Add hardsigmoid op by @beverlylytle in #2304
Update hypothesis requirement from ~=6.133.0 to ~=6.135.20 by @dependabot[bot] in #2290
skip complex dtype tensor from aminmax by @crcrpar in #2276
Extend thunder.jit coverage on HF models by @lantiga in #2281
Add PEFT benchmarking script in thunder/benchmarks by @riccardofelluga in #2254
support tuples as replaces arg for operator registration by @KaelanDt in #2308
Improve reporting from thunder.jit coverage CI job by @lantiga in #2309
Decrease SKIPPED by adding dependencies by @lantiga in #2312
Add scalar tensor input to full_sample_generator by @IvanYashchuk in #2318
Update requirements/test.txt bitsandbytes to cover aarch64 platform_machine by @nWEIdia in #2321
update TE test by @kshitij12345 in #2319
[TE] catch different error for xfail test by @kshitij12345 in #2322
Run TE tests in CI by @kshitij12345 in #2320
Add ops for HF transformers by @kiya00 in #2217
Move PEFT model materialization by @riccardofelluga in #2334
Fix dataflow ordering for recomputed symbols by @riccardofelluga in #2317
Update LoRA config for mamba models in PEFT by @riccardofelluga in #2333
Remove external logger dependency by @riccardofelluga in #2327
Relax inplace sanity check by @beverlylytle in #2314
Make alias updating the default in-place operator approach by @beverlylytle in #2052
Propagate backward tags more consistently by @beverlylytle in #2336
Make inplace flags defaults consistent by @beverlylytle in #2349
add/debug Lit CI by @Borda in #2339
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2311
ci: run notebooks with lit CI by @Borda in #2351
fix docker sanity check by @Borda in #2352
empty nvfuser.FusionCache after each test by @t-vi in #2354
bump bitsandbytes from 0.42.0 to 0.46.1 by @lianakoleva in #2238
remove spurious print by @t-vi in #2357
limit gpu mem usage by @t-vi in #2358
lit CI: switch to L4_X_2 by @Borda in #2360
[grad test] relax tolerance by @kshitij12345 in #2364
[dtensor] don't rely on repr of DTenorSpec by @kshitij12345 in #2359
[thunderfx] mark output non-differentiable based on FXGraph output node inspection by @kshitij12345 in #2348
Specify torch.randint's default dtype by @shino16 in #2342
Enable take and take_along_axis in nvfuser executor by @crcrpar in #2031
Add torch.scalar_tensor to default_torch_ops.py by @crcrpar in #2310
Make pre-commit hooks work on all python3s by @wujingyue in #2373
Remove unnecessary underscores by @wujingyue in #2372
Remove functionalization path by @beverlylytle in #2368
Add the support of uint8 / Byte to nvfuser executor by @crcrpar in #2299
Revert "Add the support of uint8 / Byte to nvfuser executor (#2299)" by @t-vi in #2378
Update hypothesis requirement from ~=6.135.20 to ~=6.136.6 by @dependabot[bot] in #2384
Remove early split trace path by @beverlylytle in #2375
remove debugging leftover by @t-vi in #2390
Bump graphviz from 0.20.3 to 0.21 by @dependabot[bot] in #2387
Fixes mincut error in rematerialization when there's overlap between source and sink variables by @kiya00 in #2369
Lower cumsum to nvfuser by @wujingyue in #2374
Fix example in README.md by @zasdfgbnm in #2381
Update TE v2 executor tests by @riccardofelluga in #2376
Remove the bookend optimization by @wujingyue in #2379
Bugfix/fix binary subscr class getitem by @tejapulagam in #2366
Add class_getitem for list, tuple, and dict by @t-vi in #2394
TransformerEngine executor checkpointing by @riccardofelluga in #2344
Add _grouped_mm and lower it to nvFuser and torchex by @protonu in #2326
[nvfuser] register prims.le by @kshitij12345 in #2377
[dtensor] use nvfuser_direct for nvfuser dtensor execution by @kshitij12345 in #2370
Support torch.square natively by @crcrpar in #2329
Adds save_thunderfx_repros to save scripts for all the subgraphs and optionally save fusion region and traces by @kiya00 in #2232
Add TEv2 Transform reset by @riccardofelluga in #2401
deps: pin cuda-python >=12.0, <13.0.0 by @Borda in #2410
docker: build images for Torch 2.8 by @Borda in https://github.com/L...

Contributors

lantiga, zasdfgbnm, and 20 other contributors

Assets 2

24 Jun 10:49

t-vi

0.2.4

ab3bee0

0.2.4

What's Changed

cleaning skipif for past Torch dev versions by @Borda in #2125
fix missing images when released on PyPI by @Borda in #2130
Add custom decompositions for cross entropy loss for the nvfuser executor by @protonu in #2043
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2139
Remove qualified access to methods from autodiff by @riccardofelluga in #2147
add non-None check for torch.utils.collect_env.get_pip_packages outputs by @crcrpar in #2124
Print repro command when test_core_vs_torch_consistency fails with sample index specified by @crcrpar in #2131
Set input_quantizer.internal to True by @beverlylytle in #2146
Bump transformers from 4.50.3 to 4.52.4 by @dependabot in #2160
Update ipython[all] requirement from ~=8.36.0 to ~=8.37.0 by @dependabot in #2159
Update coverage requirement from ~=7.6.8 to ~=7.8.2 by @dependabot in #2158
TE: update test to be more stable by @kshitij12345 in #2156
Bump pytest-timeout from 2.3.1 to 2.4.0 by @dependabot in #2161
Update dependabot - reviewers by @Borda in #2162
Fix autodiff joint trace dataflow and in-place ops in higher order functions by @riccardofelluga in #2143
Reduces the test time by @kiya00 in #2077
add a use_hf option to benchmarking by @t-vi in #2154
Bump pytest-xdist from 3.6.1 to 3.7.0 by @dependabot in #2164
Update hypothesis requirement from ~=6.131.9 to ~=6.133.0 by @dependabot in #2165
Update snowballstemmer requirement from <3 to <4 by @dependabot in #2168
nvFuser Executor: Ensure cross-entropy loss fwd is not recomputed when computing bwd by @protonu in #2180
Add parity check of shape/dtype/device of runtime and trace by @crcrpar in #2069
Add docstrings for recipes by @KaelanDt in #2185
sdpa_ex: relax test tolerances by @kshitij12345 in #2178
removing nv_enable_embedding by @jjsjann123 in #2057
Improve error reporting in benchmark job and add cleanup logic by @Borda in #2176
Add mode flag to TorchCompileExecutor by @t-vi in #2188
Use to_dtype and to_torch_dtype not _torch_to_thunder_dtype_map and _thunder_to_torch_dtype_map by @crcrpar in #2181
use hf recipe in quickstart by @t-vi in #2191
Remove kwarg construction from FusionDefinitionWrapper.call by @IvanYashchuk in #1871
Add tests for HFTransformers recipe with static cache by @KaelanDt in #2179
bump: PyTorch to be latest 2.7.1 by @Borda in #2193
prims.where ignores shape/device of pred if it's a CPU scalar tensor by @crcrpar in #2135
add decomposition for repeat interleave by @t-vi in #2194
fix traceback in with / try: finally: for Python 3.10 by @t-vi in #2195
Use joint trace in transform_for_execution by @beverlylytle in #2102
Handle proxy objects in the cuDNN SDPA checker by @kiya00 in #2073
default to hf recipe in thunder.compile for hf models by @KaelanDt in #2199
add autocast lookaside to hf recipe for tracing on meta device by @t-vi in #2200
make test_networks.py not rely on HF downloads by @KaelanDt in #2202
Add plugins documentation by @KaelanDt in #2207
implement partial, avoid tuple addition, test partialmethod by @t-vi in #2209
Add bitwise_left_shift and bitwise_right_shift by @crcrpar in #2210
[thunderfx] Avoid split at Tensor.__eq__ by registering it in thunder.torch by @crcrpar in #2211
fixed installing NCCL for CUDA by @Borda in #2208
Enable ruff-check in pre-commit by @crcrpar in #2192
unxfail passing test by @t-vi in #2220
Add #2192 to .git-blame-ignore-revs by @crcrpar in #2219
fix cache validity issue, tighten assert by @t-vi in #2223
Avoid negative number rhs values to bitwise shift tests by @crcrpar in #2227
Add missing opinfo for bitwise_right_shift to elementwise_binary_ops by @crcrpar in #2214
Enable ruff format in pre-commit by @crcrpar in #2142
Fixes baddbmm() got an unexpected keyword argument 'batch1' by @kiya00 in #2228
Only register cudnn executor if it is available by @KaelanDt in #2174
Representing DTensor in thunder traces by @kshitij12345 in #1907
install cudnn in quickstarts by @KaelanDt in #2235
add experimental by @t-vi in #2236
DTensor: don't error if torch.distributed is unavailable by @kshitij12345 in #2243
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2245
bump: OS versions for CI by @Borda in #2249
Plumbing the topk to the nvFuser executor by @protonu in #2237
bump: bitsandbytes to its next compatible release by @Borda in #2248
ci: test with latest dependencies by @Borda in #2122
Adds empty_like,rand_like by @kiya00 in #2225
Support converting SymTypes Node to input proxy by @kiya00 in #2171
fix test_reports_benchmark timeout by @kiya00 in #2229
Fix torch.gather function signature to accept input passed as keyword argument by @kiya00 in #2250
Fix bitsandbytes dependency conditions to use platform_machine instead of sys_platform by @Borda in #2257
add float exception to assertion in jit_ext by @KaelanDt in #2256
register softmax fudge function for stacklevel by @t-vi in #2259
include message in NotImplementedError in proxy methods by @t-vi in #2260
add support for full with tensor input by @t-vi in #2262
Remove W291, W293, E702, and F722 from ignore by @crcrpar in #2267
Add #2142 of ruff format integration to .git-blame-ignore-revs by @crcrpar in #2266
split getitem into basic and "purely" advanced indexing by @t-vi in #2258
TE: fix related to delayed forward-backward split by @kshitij12345 in #2222
bump version to 0.2.4 for release by @t-vi in #2273