Releases: Lightning-AI/lightning-thunder
Releases · Lightning-AI/lightning-thunder
Thunder 0.2.6 PyTorch Conference Edition
What's Changed
- bump release to dev version post 0.2.5 by @t-vi in #2513
- torch.cumsum api change by @jjsjann123 in #2507
- DTensor: support linear by @kshitij12345 in #2422
- TEv2 as default TE executor by @riccardofelluga in #2510
- Create
Symbolwithis_prim=Truein_register_custom_opby @crcrpar in #2516 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2515
- Use underlying class's new in MutableMappingWrapper.new by @t-vi in #2514
- fix: working README.md example, support nvfuser for torch==2.8 by @lianakoleva in #2525
- Add KaelanDt as codeowner by @t-vi in #2540
- add ci skips by @t-vi in #2547
- TE: Fix cudnn.h not found by @kshitij12345 in #2536
- feat: _register_custom_op supports List[torch.Tensor] by @lianakoleva in #2529
- feat: provide guidance when registering custom op by @lianakoleva in #2530
- fix: string -> f-string where intended by @lianakoleva in #2528
- Initial TE NVFP4 recipe support by @riccardofelluga in #2523
- Add DTensor prim and torch symbol for exp by @kshitij12345 in #2496
- [DTensor] Add prim and torch sym for neg and reciprocal by @kshitij12345 in #2552
- Bump pytest from 8.3.5 to 8.4.2 by @dependabot[bot] in #2567
- Bump bitsandbytes from 0.47.0 to 0.48.0 by @dependabot[bot] in #2565
- Bump diffusers from 0.34.0 to 0.35.1 by @dependabot[bot] in #2564
- switch CI to non-interruptible by @t-vi in #2554
- tests: clean up xfails in VJP tests by @aobolensk in #2578
- fix output dtype for nvfuserex cumsum by @jjsjann123 in #2580
- Use signature (*args, **kwargs) when signature is unavailable by @shino16 in #2542
- fix: call torch.cuda.is_available() in available_devices by @aobolensk in #2602
- Add uint64 to thunder->torch dtype map by @crcrpar in #2519
- Enable direct bindings in Thunder by @rdspring1 in #2502
- move to nvfuser-cu128-torch28 by @t-vi in #2604
- [DTensor] Add torch symbol and prim for _grouped_mm by @kshitij12345 in #2503
- [DTensor] Add prim and torch symbol for
addby @kshitij12345 in #2581 - MoE TensorParallel with Eager by @kshitij12345 in #2582
- fix: missing 'import torch' in README.md by @aobolensk in #2608
- Remove E741 from ruff lint ignore rules by @tpremrud in #2601
- Disallow
custom_opthat mutates arguments by @crcrpar in #2603 - Disabled TF32 on Amper+ devices to stabilize numeric accuracy by @mattteochen in #2579
- Propagate rounding_mode in div_ by @beverlylytle in #2614
- Refactor quantization.py to use TSP by @tejapulagam in #2522
- [DTensor] Add test with parallelize_module by @kshitij12345 in #2598
- Inference benchmark of "meta-llama/Llama-4-Maverick-17B-128E" by @crcrpar in #2487
- Remove outdated scenarios from inference benchmark by @crcrpar in #2619
- Add
float4_e2m1fn_x2to lcdtype_to_nvdtype_map by @crcrpar in #2532 - avoid
torch.float4_e2m1fn_x2in_get_min_and_valby @crcrpar in #2533 - [benchmark_inference] Fix replacing the MoE by @kshitij12345 in #2620
- Fix FSDP NB by @kshitij12345 in #2629
- Fixes function name is not defined when using
DebugTransformby @kiya00 in #2617 - Enable MoE TP with thunderfx by @kshitij12345 in #2611
- Add DIV_EXACT prim by @beverlylytle in #2626
- try getting version from
nvfuser_directfirst by @crcrpar in #2623 - Fix hf example and benchmark run on CPU by @aobolensk in #2583
- getattr should always be taken from the class and then bound by @t-vi in #2584
- Update benchmark_inference.py to support TP with thunderfx by @kshitij12345 in #2625
- Add TE's NVFP4 recipe to the test suite by @riccardofelluga in #2612
- Jj/cumsum nvfuserex opinfo tolerance by @jjsjann123 in #2586
- tests: Extend testing for dunder and binary elementwise operations by @aobolensk in #2597
- Propagate
disable_torch_autogradto thunderfx's_splitterby @crcrpar in #2534 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2521
- [benchmark_inference] Enable
tqdmonly on rank0 by @crcrpar in #2630 - fix tests for CI by @t-vi in #2627
- Remove cuda checks from TE and Triton xentropy executors by @riccardofelluga in #2613
- Fixes backward issue where silu outputs nan by @kiya00 in #2624
- [benchmark_inference] Update
from_linearandfrom_grouped_linearto acceptfqn: strby @crcrpar in #2631 - Have seed and offset in int64 for cudnn-frontend SDPA by @crcrpar in #2520
- Refactor low precision option handling in
benchmark_litgpt.pyby @riccardofelluga in #2615 - Bump the gha-updates group with 3 updates by @dependabot[bot] in #2568
- Fix import of TE
Recipeby @ksivaman in #2635 - fix benchmarks job by @t-vi in #2607
- Add profile transform by @t-vi in #2636
- Enable interpolate tests, add PrimID mapping for ceil and floor by @aobolensk in #2609
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2637
- Update jvp computation in
test_grad.pyby @mattteochen in #2618 - Propagated requires_grad to torch tensor by @mattteochen in #2616
- add mask lookaside into transformers recipe by @t-vi in #2639
- Fix thunderjit in inference benchmark by @t-vi in #2644
- Warm up sufficiently by @wujingyue in #2638
- Reset peak memory stats before measurement by @wujingyue in #2647
- drop extra executor by @t-vi in #2648
- Relax Test Tolerances for TE executor tests by @riccardofelluga in #2646
- test_vjp_correctness_sdpa_manual: relax test tolerance (#2576) by @kiya00 in #2628
- [benchmark_inference] Decrease max_new_tokens for warm-up by @kshitij12345 in #2649
- [benchmark_inference] Reshape the output from run_routed_experts by @kshitij12345 in #2650
- Revert "[benchmark_inference] Decrease max_new_tokens for warm-up" by @wujingyue ...
Thunder 0.2.5 - Summer Harvest
What's Changed
- bump version to 0.2.5.dev0 by @t-vi in #2274
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2271
- Add support for torch.argsort by @protonu in #2246
- Add test for Phi-3-vision-128k-instruct by @kshitij12345 in #1850
- DTensor: NVFuser Integration by @kshitij12345 in #2177
- fix lint from merge by @t-vi in #2277
- Remove F842 from ignore rules by @crcrpar in #2270
- E2E Coverage Test for Thunder by @tejapulagam in #2086
- convert pow gradient to new style by @t-vi in #2283
- add Windows xfail/skipif to tests not working on windows by @t-vi in #2284
- fix activation checkpointing in the joint trace by @beverlylytle in #2203
- Remove F811 from ignore rules by @crcrpar in #2268
- fix lint by @t-vi in #2286
- Return saved-for-backward objects as tuples by @beverlylytle in #2279
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2287
- Update coverage requirement from ~=7.8.2 to ~=7.9.1 by @dependabot[bot] in #2292
- Add "transformer_engine_v2" to expected all executors set by @crcrpar in #2226
- Fix CI benchmarks by @KaelanDt in #2303
- Bump pytest-random-order from 1.1.1 to 1.2.0 by @dependabot[bot] in #2289
- prune redundant ifs in ci workflows by @Borda in #2296
- Bump pytest-cov from 6.1.1 to 6.2.1 by @dependabot[bot] in #2291
- Add hardsigmoid op by @beverlylytle in #2304
- Update hypothesis requirement from ~=6.133.0 to ~=6.135.20 by @dependabot[bot] in #2290
- skip complex dtype tensor from
aminmaxby @crcrpar in #2276 - Extend thunder.jit coverage on HF models by @lantiga in #2281
- Add PEFT benchmarking script in
thunder/benchmarksby @riccardofelluga in #2254 - support tuples as
replacesarg for operator registration by @KaelanDt in #2308 - Improve reporting from thunder.jit coverage CI job by @lantiga in #2309
- Decrease SKIPPED by adding dependencies by @lantiga in #2312
- Add scalar tensor input to full_sample_generator by @IvanYashchuk in #2318
- Update requirements/test.txt bitsandbytes to cover aarch64 platform_machine by @nWEIdia in #2321
- update TE test by @kshitij12345 in #2319
- [TE] catch different error for xfail test by @kshitij12345 in #2322
- Run TE tests in CI by @kshitij12345 in #2320
- Add ops for HF transformers by @kiya00 in #2217
- Move PEFT model materialization by @riccardofelluga in #2334
- Fix dataflow ordering for recomputed symbols by @riccardofelluga in #2317
- Update LoRA config for mamba models in PEFT by @riccardofelluga in #2333
- Remove external logger dependency by @riccardofelluga in #2327
- Relax inplace sanity check by @beverlylytle in #2314
- Make alias updating the default in-place operator approach by @beverlylytle in #2052
- Propagate backward tags more consistently by @beverlylytle in #2336
- Make inplace flags defaults consistent by @beverlylytle in #2349
- add/debug Lit CI by @Borda in #2339
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2311
- ci: run notebooks with lit CI by @Borda in #2351
- fix docker sanity check by @Borda in #2352
- empty nvfuser.FusionCache after each test by @t-vi in #2354
- bump bitsandbytes from 0.42.0 to 0.46.1 by @lianakoleva in #2238
- remove spurious print by @t-vi in #2357
- limit gpu mem usage by @t-vi in #2358
- lit CI: switch to
L4_X_2by @Borda in #2360 - [grad test] relax tolerance by @kshitij12345 in #2364
- [dtensor] don't rely on repr of DTenorSpec by @kshitij12345 in #2359
- [thunderfx] mark output non-differentiable based on FXGraph output node inspection by @kshitij12345 in #2348
- Specify torch.randint's default dtype by @shino16 in #2342
- Enable
takeandtake_along_axisin nvfuser executor by @crcrpar in #2031 - Add
torch.scalar_tensortodefault_torch_ops.pyby @crcrpar in #2310 - Make pre-commit hooks work on all python3s by @wujingyue in #2373
- Remove unnecessary underscores by @wujingyue in #2372
- Remove functionalization path by @beverlylytle in #2368
- Add the support of
uint8/Byteto nvfuser executor by @crcrpar in #2299 - Revert "Add the support of
uint8/Byteto nvfuser executor (#2299)" by @t-vi in #2378 - Update hypothesis requirement from ~=6.135.20 to ~=6.136.6 by @dependabot[bot] in #2384
- Remove early split trace path by @beverlylytle in #2375
- remove debugging leftover by @t-vi in #2390
- Bump graphviz from 0.20.3 to 0.21 by @dependabot[bot] in #2387
- Fixes mincut error in rematerialization when there's overlap between source and sink variables by @kiya00 in #2369
- Lower cumsum to nvfuser by @wujingyue in #2374
- Fix example in README.md by @zasdfgbnm in #2381
- Update TE v2 executor tests by @riccardofelluga in #2376
- Remove the bookend optimization by @wujingyue in #2379
- Bugfix/fix binary subscr class getitem by @tejapulagam in #2366
- Add class_getitem for list, tuple, and dict by @t-vi in #2394
- TransformerEngine executor checkpointing by @riccardofelluga in #2344
- Add _grouped_mm and lower it to nvFuser and torchex by @protonu in #2326
- [nvfuser] register prims.le by @kshitij12345 in #2377
- [dtensor] use nvfuser_direct for nvfuser dtensor execution by @kshitij12345 in #2370
- Support
torch.squarenatively by @crcrpar in #2329 - Adds save_thunderfx_repros to save scripts for all the subgraphs and optionally save fusion region and traces by @kiya00 in #2232
- Add TEv2 Transform reset by @riccardofelluga in #2401
- deps: pin
cuda-python >=12.0, <13.0.0by @Borda in #2410 - docker: build images for Torch 2.8 by @Borda in https://github.com/L...
0.2.4
What's Changed
- cleaning
skipiffor past Torch dev versions by @Borda in #2125 - fix missing images when released on PyPI by @Borda in #2130
- Add custom decompositions for cross entropy loss for the nvfuser executor by @protonu in #2043
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2139
- Remove qualified access to methods from autodiff by @riccardofelluga in #2147
- add non-None check for
torch.utils.collect_env.get_pip_packagesoutputs by @crcrpar in #2124 - Print repro command when
test_core_vs_torch_consistencyfails with sample index specified by @crcrpar in #2131 - Set input_quantizer.internal to True by @beverlylytle in #2146
- Bump transformers from 4.50.3 to 4.52.4 by @dependabot in #2160
- Update ipython[all] requirement from ~=8.36.0 to ~=8.37.0 by @dependabot in #2159
- Update coverage requirement from ~=7.6.8 to ~=7.8.2 by @dependabot in #2158
- TE: update test to be more stable by @kshitij12345 in #2156
- Bump pytest-timeout from 2.3.1 to 2.4.0 by @dependabot in #2161
- Update dependabot - reviewers by @Borda in #2162
- Fix autodiff joint trace dataflow and in-place ops in higher order functions by @riccardofelluga in #2143
- Reduces the test time by @kiya00 in #2077
- add a use_hf option to benchmarking by @t-vi in #2154
- Bump pytest-xdist from 3.6.1 to 3.7.0 by @dependabot in #2164
- Update hypothesis requirement from ~=6.131.9 to ~=6.133.0 by @dependabot in #2165
- Update snowballstemmer requirement from <3 to <4 by @dependabot in #2168
- nvFuser Executor: Ensure cross-entropy loss fwd is not recomputed when computing bwd by @protonu in #2180
- Add parity check of shape/dtype/device of runtime and trace by @crcrpar in #2069
- Add docstrings for recipes by @KaelanDt in #2185
- sdpa_ex: relax test tolerances by @kshitij12345 in #2178
- removing nv_enable_embedding by @jjsjann123 in #2057
- Improve error reporting in benchmark job and add cleanup logic by @Borda in #2176
- Add mode flag to TorchCompileExecutor by @t-vi in #2188
- Use
to_dtypeandto_torch_dtypenot_torch_to_thunder_dtype_mapand_thunder_to_torch_dtype_mapby @crcrpar in #2181 - use hf recipe in quickstart by @t-vi in #2191
- Remove kwarg construction from FusionDefinitionWrapper.call by @IvanYashchuk in #1871
- Add tests for HFTransformers recipe with static cache by @KaelanDt in #2179
- bump: PyTorch to be latest
2.7.1by @Borda in #2193 prims.whereignores shape/device ofpredif it's a CPU scalar tensor by @crcrpar in #2135- add decomposition for repeat interleave by @t-vi in #2194
- fix traceback in with / try: finally: for Python 3.10 by @t-vi in #2195
- Use joint trace in transform_for_execution by @beverlylytle in #2102
- Handle proxy objects in the cuDNN SDPA checker by @kiya00 in #2073
- default to hf recipe in thunder.compile for hf models by @KaelanDt in #2199
- add autocast lookaside to hf recipe for tracing on meta device by @t-vi in #2200
- make test_networks.py not rely on HF downloads by @KaelanDt in #2202
- Add plugins documentation by @KaelanDt in #2207
- implement partial, avoid tuple addition, test partialmethod by @t-vi in #2209
- Add
bitwise_left_shiftandbitwise_right_shiftby @crcrpar in #2210 - [thunderfx] Avoid split at
Tensor.__eq__by registering it inthunder.torchby @crcrpar in #2211 - fixed installing NCCL for CUDA by @Borda in #2208
- Enable
ruff-checkin pre-commit by @crcrpar in #2192 - unxfail passing test by @t-vi in #2220
- Add #2192 to
.git-blame-ignore-revsby @crcrpar in #2219 - fix cache validity issue, tighten assert by @t-vi in #2223
- Avoid negative number rhs values to bitwise shift tests by @crcrpar in #2227
- Add missing opinfo for
bitwise_right_shifttoelementwise_binary_opsby @crcrpar in #2214 - Enable ruff format in pre-commit by @crcrpar in #2142
- Fixes baddbmm() got an unexpected keyword argument 'batch1' by @kiya00 in #2228
- Only register cudnn executor if it is available by @KaelanDt in #2174
- Representing DTensor in thunder traces by @kshitij12345 in #1907
- install cudnn in quickstarts by @KaelanDt in #2235
- add experimental by @t-vi in #2236
- DTensor: don't error if torch.distributed is unavailable by @kshitij12345 in #2243
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2245
- bump: OS versions for CI by @Borda in #2249
- Plumbing the topk to the nvFuser executor by @protonu in #2237
- bump: bitsandbytes to its next compatible release by @Borda in #2248
- ci: test with latest dependencies by @Borda in #2122
- Adds empty_like,rand_like by @kiya00 in #2225
- Support converting SymTypes Node to input proxy by @kiya00 in #2171
- fix test_reports_benchmark timeout by @kiya00 in #2229
- Fix torch.gather function signature to accept
inputpassed as keyword argument by @kiya00 in #2250 - Fix
bitsandbytesdependency conditions to useplatform_machineinstead ofsys_platformby @Borda in #2257 - add float exception to assertion in jit_ext by @KaelanDt in #2256
- register softmax fudge function for stacklevel by @t-vi in #2259
- include message in NotImplementedError in proxy methods by @t-vi in #2260
- add support for full with tensor input by @t-vi in #2262
- Remove W291, W293, E702, and F722 from
ignoreby @crcrpar in #2267 - Add #2142 of ruff format integration to
.git-blame-ignore-revsby @crcrpar in #2266 - split getitem into basic and "purely" advanced indexing by @t-vi in #2258
- TE: fix related to delayed forward-backward split by @kshitij12345 in #2222
- bump version to 0.2.4 for release by @t-vi in #2273
Full Changelog: 0.2.3...0.2.4
Thunder 0.2.3
release 0.2.3 (#2126)
Thunder 0.2.2
Released at GTC 2025 with the latest and greatest
Preview Release 0.2.1
bump version for release (#1739)
Initial release
0.1.0 releasing `0.1.0`