Release notes

IREE-Turbine Core

Enabled stream sharing between IREE and iree-turbine #883: Leveraged IREE's external HIP stream functionality to reduce launch latency when interoperating between PyTorch and IREE, improving performance for mixed-framework workflows.
Introduced initial async execution support in the runtime #719: Added support for asynchronous task execution in the IREE runtime, enabling better parallelism and future integration with asynchronous frontends and runtimes.

IREE-Wave

New Operators & Kernels

Added soft-sign kernel to replace tanh approximation kernel #829
Added GEMM example without transpose B #881

Documentation

Added documentation on several optimization passes #833, #835
Added Jupyter notebook with GEMM example #784
Added tutorial to implement fused softmax using wave #891

Kernel Improvements

Various bug fixes and performance improvements for decode attention and speculative decoding kernels #838, #839, #854, #858, #878, #890, #888

Compiler Optimizations

Improved GEMM performance with ping-pong optimizations #846, #832
Better shared memory allocation management #822, #831

General Improvements

Options to print compiler pass timings #813
Ability to print fx.graphs in mlir style #775

New Contributors

@AWoloszyn made their first contribution in #719
@fabianmcg made their first contribution in #838
@SourishW made their first contribution in #866
@badgerbroch made their first contribution in #867
@bodhisaha made their first contribution in #869
@efric made their first contribution in #881
@Groverkss made their first contribution in #887

Full changelog

List of changes

First pass of async execution in the runtime. by @AWoloszyn in #719
[Wave] Add scaling to reference kernel in cache test by @raikonenfnu in #781
[Wave] Implement block wide reduction by @raikonenfnu in #765
[WAVE] Support wave-wise cumsum scan op::more than one elem per thread by @xintin in #753
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #782
Enable benchmarking in performance ci by @harsh-nod in #164
[TKW] Paged decode MHA using GenericDot by @Hardcode84 in #779
[TKW] Update paged decode API by @Hardcode84 in #786
[TKW] Silence expansion warning by @Hardcode84 in #788
[Wave] Fix broken lit tests by @harsh-nod in #789
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #785
[BOO] enable tuning specs for boo_convs by @zjgarvey in #783
Bump version to 3.5.0 after 3.4.0 release. by @ScottTodd in #793
[BOO] Add a timeout for out-of-process compilation by @zjgarvey in #796
[Wave] add excalidraw visualization of wave compilation pipeline by @martin-luecke in #777
[Wave] support for partial unrolling of iterate by @martin-luecke in #776
[BOO] Include tuning specs in pip package by @zjgarvey in #800
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #799
Bump actions/create-github-app-token from 2.0.2 to 2.0.6 in the github-actions group by @dependabot in #795
Add support for additional type interop by @rsuderman in #803
[TKW] Avoid cache folder race conditions when running tests in parallel by @Hardcode84 in #808
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #806
[TKW] Paged decode: move scaling into the kernel by @Hardcode84 in #807
[runtime] Enable async for CustomOp eager execution by @zjgarvey in #802
[boo_driver] Align -t option name with miopen driver by @rkayaith in #810
[Wave] Add jupyter notebook showing how to do a GEMM by @harsh-nod in #784
[TKW] Paged Decode dynamic symbols by @Hardcode84 in #809
[TKW] Tweak Paged Decode kernel waves/tile sizes by @Hardcode84 in #794
[Wave] Add mlir style printing by @harsh-nod in #775
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #812
[BOO] Fix a bug in boo_driver.py resulting from arg rename by @zjgarvey in #818
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #814
[boo] Add more tuning configurations by @rkayaith in #819
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #823
[BOO] remove generate.py script by @zjgarvey in #824
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #825
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #826
[Wave] Minimize shared allocs by @harsh-nod in #822
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #830
[BOO] Enable bad kernel launch fix + kwarg filtered replacement by @zjgarvey in #828
[Wave] Add additional check when minimizing allocs by @harsh-nod in #831
[BOO] Add autocast functionality to boo_conv by @zjgarvey in #827
[Wave] Add scheduling/synchronization related ops by @raikonenfnu in #832
[Wave] Add documentation by @harsh-nod in #833
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #834
[Wave] Add install requirement for mermaid diagrams by @harsh-nod in #835
[TKW] BF16 support in paged decode kernel by @Hardcode84 in #836
[TKW] Option to print pass timings by @Hardcode84 in #813
[runtime] Rename dltensor capsule to avoid memory leak by @zjgarvey in #843
[Wave] Generalize Barriers to work across iterations by @raikonenfnu in #839
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #842
[TKW] Improve wave runtime by @Hardcode84 in #844
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #848
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #849
[Wave] Two PP cluster GEMM by @raikonenfnu in #846
[Wave] Add ability to compare against flash_attn_func by @harsh-nod in #798
[BOO] Enable torch.compile for models with boo convs by @zjgarvey in #850
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #852
[wave] Fix bug in Permute.transform_index by @fabianmcg in #838
Pin IREE version to 3.5.0rc20250516 by @Hardcode84 in #853
[TKW] Large shape and wave_runtime for paged decode test by @Hardcode84 in #855
[Wave] Make APLP faster by @harsh-nod in #840
[Wave] Fix failing jupyter test by @harsh-nod in #854
[Wave] Fix IREE reference failing with wrong signature by @raikonenfnu in #858
Use turbine.runtime for generate_iree_ref by @Hardcode84 in #861
[boo] More tuning configurations by @rkayaith in #864
[BOO] Add an env variable for toggling backward boo convolutions by @zjgarvey in #865
[Wave] Introduce softsign kernel to replace tanh_approx by @yichiche in #829
[Wave] Align variable names in speculative decode kernel by @xintin in #872
[Wave] Added sin op by @SourishW in #866
[Wave] Added atan2 operation by @badgerbroch in #867
[Wave] added cos op by @bodhisaha in #869
[TKW] Drop inplace flag by @Hardcode84 in #863
Move scaled_dot_product_attention_bhsd under iree.turbine by @aviator19941 in #870
[Wave] Implement C = A @ B GEMM example by @efric in #881
[BOO] add a top-level readme showcasing basic usage by @zjgarvey in #882
Update IREE deps by @Hardcode84 in #880
[TKW] Add patterns to sink vector.extract/splat ops by @Hardcode84 in #851
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #886
Revert "[runtime] Rename dltensor capsule to avoid memory leak" by @Groverkss in #887
[WAVE] Correcting predicts dim in the first speculative decode kernel by @xintin in #878
[Wave] Speculative decode - default predict token to last token in vocab by @nithinsubbiah in #890
Allow stream sharing between iree and iree-turbine. by @AWoloszyn in #883
Update IREE version, remove dlpack hack and add tensor leak test by @Hardcode84 in #893
[Wave] Hide APLP warning by @Hardcode84 in #898
[BOO] Add Conv1d and Conv3d replacement capability by @zjgarvey in #885
[BOO] don't permute None in pytorch convolution backward by @zjgarvey in #884
[WAVE] Added Fused Softmax and Vector Addition Kernels/Tests Closes #874 by @bodhisaha in #891
[WAVE] Update ci tk yaml to have updated rocm installation by @xintin in #902
[WAVE] Update speculative decode first kernel's dims from 3D to 1D by @xintin in #888
Add a python3.11 mi300 unit tests and type checking job by @zjgarvey in #903
Reduce CPU overhead in lookup_device_from_torch by @zjgarvey in #905

Commit history: v3.4.0...v3.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v3.5.0

Release notes

IREE-Turbine Core

IREE-Wave

New Operators & Kernels

Documentation

Kernel Improvements

Compiler Optimizations

General Improvements

New Contributors

Full changelog

Contributors

Uh oh!