Skip to content

Releases: iree-org/iree-turbine

Release v3.6.0

21 Jul 22:13
356fa39
Compare
Choose a tag to compare

IREE-Turbine Release Notes

The iree-turbine project has been migrated to the new Wave repository. More details here.

IREE-Wave

New Operators & Kernels

  • Atomic min op lowering - supports int data types (#724)
  • BitcastOp to represent or pack data from same bandwidth (#933)
  • Implemented Cube root (CBRT) unary op from math dialect (#964)
  • ScaledMMA Op (#815)
  • Support Linear Layer BMM for MXFP4 (#996)
  • More operators (#976)

Documentation

  • Trace setup instructions (#1040)

Kernel Improvements

  • Paged decode attention kernel (#934, #944, #982, #985, #1010, #1022, #1027, #1062)
  • Speculative decode kernel (#1034)
  • Symbol Separation for Decode Logic (#879): Split K2 into SPLIT_ITER (used for masking/reduction) and KV_LENS (actual tensor size); improved handling of KV_START_IDX for proper masking.
  • Tile Masking for Sub-Vector Sizes (#906): Added support for tile_size < vector_size in WG/Wave tiles by masking out-of-bounds elements during shared memory access.
  • Dynamic Dimension Support in Attention Kernel (#995): Made B, H, M, and K2 dynamic while keeping N static in bhsd_attention kernel for better flexibility.
  • In-thread transpose (iree-org/wave#10)

Compiler Optimizations

  • Scaled Indexing Support (#937): Compiler now handles "scaled" dimensions to support packed sub-byte matmuls (e.g., F4/I8 or F16/I32 formats).
  • Buffer Access Correctness Fixes (#916): Replaced generic buffer ops with stride-aware vector loads/stores using fat_raw_buffer_cast.
  • Indexing & Multibuffering Fixes (#936): Corrected unrolled op indexing and improved input/output mapping consistency in multi-buffered pipelines.
  • Mask/Bounds Refactoring (#921): Moved bounds/masking logic to a dedicated pass for better maintainability (no functional change).
  • Remove Redundant Subgraphs Post-Pipelining (#939): Cleaned up unused subgraphs left behind by reduction pipelining for cleaner traces.
  • Generic Torch Backend for iree_turbine (#953): Added torch.compile backend integration for launching Launchable kernels via jit_compile.

General Improvements

  • Added ability to export and import schedules (#847)
  • Make cmake execute_process errors fatal (#958)
  • Option to tweak schedules (#896)
  • Search over schedules for a given kernel (#1019)
  • Stack Tracing in Wave Kernels (#923): Enabled full stack trace capture during Wave op construction for better debugging, especially within PyTorch integration.
  • ScanOp Enhancements (#947): Enabled local scan before global scan in all ScanOp cases; updated E2E tests.
  • Proof-of-life Link to C++ Opt Tool (#961): Connected Wave kernels to external C++ tools for out-of-bounds access checks using dataflow analysis.
  • Workgroup Reordering Constraint (#1000): Introduced ReorderingConstraint for customizable workgroup layouts; includes GEMM kernel template and tests.
  • Bug fixes, cleanups and test coverage (#1009, #1028, #1017, #1045, #1002, #1052, #1057, #978, #1023, #962 #821)

Integration

  • Wave kernel integration with sharktank (#787)

IREE-Turbine Core

  • Introduced Graph-Based Fusion API (#967)
  • Reduce CPU overhead from boo convolution launches (#900, #946)
  • Layer normalization kernels support in boo (#979)
  • Allow inlining IR for layout customizable convolutions (#1069)
  • boo fixes and updates (#910, #950, #971, #992)
  • Integrate boo fusion as a torch.compile backend (#1046)
  • Enable make_single_dispatch specification in FusionSchema (#1079)
  • Filtering based on functions provided to the fusion schema (#1080)
  • Option tensor arguments for CustomOp (#1044)
  • Replace aten.convolution with boo convolution (#1076)
  • Provide default handling of channels-last formats for graph ops (#1054)
  • Test coverage (#1064, #1063)

Change History

List of changes

What's Changed

  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #907
  • [BOO] Reduce CPU overhead by @zjgarvey in #900
  • [Wave] Add atomic min op lowering by @nithinsubbiah in #724
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #911
  • [Wave] Add aplp to gitignore by @tgymnich in #913
  • Revert "[Wave] Add aplp to gitignore" by @Hardcode84 in #914
  • [BOO] Add some boo_driver improvements by @zjgarvey in #910
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #918
  • Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in #919
  • [Wave] Add support for WG/Wave tiles smaller than vector sizes by @Hardcode84 in #906
  • [TKW] Update decode by @Hardcode84 in #879
  • [boo] Make boo_driver friendlier for shared machines by @rkayaith in #920
  • Add stack tracing capability to Wave by @ftynse in #923
  • [BOO] Remove xfail mark on passing test by @zjgarvey in #929
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #926
  • [Wave] Mask/Bounds generation refactoring by @Hardcode84 in #921
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #935
  • [Wave] Implement BitcastOp by @raikonenfnu in #933
  • [boo] Even more tuning configurations by @rkayaith in #928
  • Initial plumbing for Wave kernel integration with sharktank by @aviator19941 in #787
  • [Wave] Add ability to export and import schedules by @harsh-nod in #847
  • [Wave] Do not print IR on failure by default by @Hardcode84 in #924
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #938
  • [Wave] Add more sglang shapes and remove divisibility check by @Hardcode84 in #925
  • [Wave] Paged decode output_dtype support by @Hardcode84 in #934
  • [Wave] Add missing Paged decode logit cap by @Hardcode84 in #944
  • [Wave] Teach compiler to handle "scaled" dimensions/indexing by @raikonenfnu in #937
  • [Wave] Add aplp to gitignore by @tgymnich in https://github.com/iree...
Read more

Release v3.5.0

11 Jun 15:33
v3.5.0
b99b395
Compare
Choose a tag to compare

Release notes

IREE-Turbine Core

  • Enabled stream sharing between IREE and iree-turbine #883: Leveraged IREE's external HIP stream functionality to reduce launch latency when interoperating between PyTorch and IREE, improving performance for mixed-framework workflows.
  • Introduced initial async execution support in the runtime #719: Added support for asynchronous task execution in the IREE runtime, enabling better parallelism and future integration with asynchronous frontends and runtimes.

IREE-Wave

New Operators & Kernels

  • Added soft-sign kernel to replace tanh approximation kernel #829
  • Added GEMM example without transpose B #881

Documentation

  • Added documentation on several optimization passes #833, #835
  • Added Jupyter notebook with GEMM example #784
  • Added tutorial to implement fused softmax using wave #891

Kernel Improvements

  • Various bug fixes and performance improvements for decode attention and speculative decoding kernels #838, #839, #854, #858, #878, #890, #888

Compiler Optimizations

  • Improved GEMM performance with ping-pong optimizations #846, #832
  • Better shared memory allocation management #822, #831

General Improvements

  • Options to print compiler pass timings #813
  • Ability to print fx.graphs in mlir style #775

New Contributors

Full changelog

List of changes

  • First pass of async execution in the runtime. by @AWoloszyn in #719
  • [Wave] Add scaling to reference kernel in cache test by @raikonenfnu in #781
  • [Wave] Implement block wide reduction by @raikonenfnu in #765
  • [WAVE] Support wave-wise cumsum scan op::more than one elem per thread by @xintin in #753
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #782
  • Enable benchmarking in performance ci by @harsh-nod in #164
  • [TKW] Paged decode MHA using GenericDot by @Hardcode84 in #779
  • [TKW] Update paged decode API by @Hardcode84 in #786
  • [TKW] Silence expansion warning by @Hardcode84 in #788
  • [Wave] Fix broken lit tests by @harsh-nod in #789
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #785
  • [BOO] enable tuning specs for boo_convs by @zjgarvey in #783
  • Bump version to 3.5.0 after 3.4.0 release. by @ScottTodd in #793
  • [BOO] Add a timeout for out-of-process compilation by @zjgarvey in #796
  • [Wave] add excalidraw visualization of wave compilation pipeline by @martin-luecke in #777
  • [Wave] support for partial unrolling of iterate by @martin-luecke in #776
  • [BOO] Include tuning specs in pip package by @zjgarvey in #800
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #799
  • Bump actions/create-github-app-token from 2.0.2 to 2.0.6 in the github-actions group by @dependabot in #795
  • Add support for additional type interop by @rsuderman in #803
  • [TKW] Avoid cache folder race conditions when running tests in parallel by @Hardcode84 in #808
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #806
  • [TKW] Paged decode: move scaling into the kernel by @Hardcode84 in #807
  • [runtime] Enable async for CustomOp eager execution by @zjgarvey in #802
  • [boo_driver] Align -t option name with miopen driver by @rkayaith in #810
  • [Wave] Add jupyter notebook showing how to do a GEMM by @harsh-nod in #784
  • [TKW] Paged Decode dynamic symbols by @Hardcode84 in #809
  • [TKW] Tweak Paged Decode kernel waves/tile sizes by @Hardcode84 in #794
  • [Wave] Add mlir style printing by @harsh-nod in #775
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #812
  • [BOO] Fix a bug in boo_driver.py resulting from arg rename by @zjgarvey in #818
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #814
  • [boo] Add more tuning configurations by @rkayaith in #819
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #823
  • [BOO] remove generate.py script by @zjgarvey in #824
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #825
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #826
  • [Wave] Minimize shared allocs by @harsh-nod in #822
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #830
  • [BOO] Enable bad kernel launch fix + kwarg filtered replacement by @zjgarvey in #828
  • [Wave] Add additional check when minimizing allocs by @harsh-nod in #831
  • [BOO] Add autocast functionality to boo_conv by @zjgarvey in #827
  • [Wave] Add scheduling/synchronization related ops by @raikonenfnu in #832
  • [Wave] Add documentation by @harsh-nod in #833
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #834
  • [Wave] Add install requirement for mermaid diagrams by @harsh-nod in #835
  • [TKW] BF16 support in paged decode kernel by @Hardcode84 in #836
  • [TKW] Option to print pass timings by @Hardcode84 in #813
  • [runtime] Rename dltensor capsule to avoid memory leak by @zjgarvey in #843
  • [Wave] Generalize Barriers to work across iterations by @raikonenfnu in #839
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #842
  • [TKW] Improve wave runtime by @Hardcode84 in #844
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #848
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #849
  • [Wave] Two PP cluster GEMM by @raikonenfnu in #846
  • [Wave] Add ability to compare against flash_attn_func by @harsh-nod in #798
  • [BOO] Enable torch.compile for models with boo convs...
Read more

Release v3.4.0

05 May 16:51
v3.4.0
83b2de2
Compare
Choose a tag to compare

Release notes

New Kernels

  • Speculative sampling kernel
  • MHA kernel
  • Added wave.nn.Linear and wave.nn.QuantLinear that can be drop-in replacements for nn.Linear
  • Added wave_sdpa and wave_sdpa_fp8 that can be drop-in replacements for torch.sdpa
  • Support for more attention variants
  • Extend attention API modified to match flashinfer API

New language features

  • Block reduce, Cumulative sum, Roundeven,
  • tkw.iterate allows iterating till a condition is specified
  • Generic Dot support
  • Atomic Min support
  • Support for scalar kernel arguments
  • Elements per thread now optional

Compiler optimizations

  • 2-stage prefetching for GEMMs
  • Tanh approximation implemented
  • Added support for fast math flags
  • Multi-buffering support for GEMMs

Other notable changes

  • Introduced wave_compile API for compiling kernels
  • tkw.reduction renamed to tkw.iterate
  • Bug fixes in kernel caching
  • Added validation and error messages
  • Added location tracking support

New Contributors

Full changelog

List of changes

  • Added roundeven op by @xintin in #600
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #605
  • [runtime] Tier out internal Launchable caching and get hal device from uuid by @zjgarvey in #610
  • [Wave] Compute grid before kernel launch by @harsh-nod in #606
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #615
  • Bump version to 3.4.0 after 3.3.0 release. by @ScottTodd in #616
  • [BOO] Adds a function for getting a Launchable from a ConvSignature by @zjgarvey in #608
  • [TKW] Keep disambiguating number in expanded node name by @GMNGeoffrey in #437
  • [Wave] Introduce wave compile and partition utils by @harsh-nod in #612
  • [TKW] Flash Attention 2 Backward Pass by @GMNGeoffrey in #553
  • [Wave] Add 2-stage prefetch scheduling strategy for GEMM by @raikonenfnu in #614
  • Bump the github-actions group with 2 updates by @dependabot in #617
  • [Wave] temporarily turn off test for non default paths. by @raikonenfnu in #624
  • [Wave] Implement Linear Layer for Wave by @raikonenfnu in #619
  • Add a basic jinja custom op for conv_2d_nhwc_fhwc by @zjgarvey in #622
  • [Wave] Update extend attention kernel signature by @nithinsubbiah in #621
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #618
  • [BOO] enable using conv_2d_nhwc_fhwc in conv exports by @zjgarvey in #627
  • [Wave] Build wave runtime using torch pip package by @harsh-nod in #625
  • [Wave] Restore benchmarking code by @harsh-nod in #628
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #638
  • [BOO] Add driver script for convolutions by @rkayaith in #636
  • [BOO] Add a cache prepopulator by @zjgarvey in #642
  • [Wave] Implement FP8 Quantized attention by @raikonenfnu in #637
  • [Wave][Cache] Add caching support for nested functions by @raikonenfnu in #649
  • [Wave] Add missing header by @harsh-nod in #645
  • Update docs with new wave_compile API. by @harsh-nod in #646
  • [TKW] Make elements-per-thread optional by @Hardcode84 in #650
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #660
  • [TKW] Infer elements_per_thread for elementwise ops by @Hardcode84 in #658
  • Inject cache directory into the cache manager by @GMNGeoffrey in #590
  • [TKW] API and annotation fix by @martin-luecke in #664
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #663
  • [Wave] Implement fused bias-add on Wave kernel Linear Layer by @raikonenfnu in #669
  • [TKW] Disable ceildiv emulation by @Hardcode84 in #668
  • [TKW] Propagate index from reduce nodes by @Hardcode84 in #644
  • [Wave] Use reciprocal to compute softcap logits by @harsh-nod in #674
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #676
  • Custom mask bshd attention variant by @xintin in #665
  • [Wave] Add support for sliding window attention by @harsh-nod in #626
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #682
  • [TKW] Add missing validation and error messages by @GMNGeoffrey in #432
  • [TKW] Use affine apply by @Hardcode84 in #666
  • Feat: better scaling for fp8 quant by @Giuseppe5 in #679
  • [TKW] Skip slow tests by @Hardcode84 in #671
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #687
  • [Wave] Add fast math option by @harsh-nod in #686
  • [Wave] Add GQA/MQA Vanilla template by @harsh-nod in #688
  • [boo_driver] Add --csv and --splat-input-value options by @rkayaith in #684
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #689
  • [Wave] Add quantized linear layer kernel by @nithinsubbiah in #681
  • [Wave] Initial support for Multibuffering by @martin-luecke in #633
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #694
  • Bump the github-actions group with 2 updates by @dependabot in #662
  • [BOO] add an autograd function for conv launchables by @zjgarvey in #670
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #698
  • [Wave] Fix issues in bshd gqa kernel by @harsh-nod in #692
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #700
  • [BOO] Adds generic convolution custom ops for BOO convs by @zjgarvey in #701
  • [BOO] adds a non-jit-compile Launchable and a numerics tool for convs by @zjgarvey in #696
  • [Wave] Add custom causal mask for Extend Attention by @nithinsubbiah in #639
  • [BOO] temporarily xfail bias grad-check test by @zjgarvey in #705
  • [BOO] Small fixes and a README update by @zjgarvey in #707
  • [Wave] Improve WaveCacher to handle changing freevar in nested fn. by @raikonenfnu in #706
  • Capture location information from Python by @ftynse in #675
  • [Wave] Implement efficient and user friendly FP8 SDPA Op by @raikonenfnu in #708
  • [Wave] Refactor layers to nn directory to mirror torch. by @raikonenfnu in #710
  • [Wave] Add gqa decode kernel by @harsh-nod in #703
  • [Wave] Turn on wave runtime for extend attention test by @nithinsubbiah in #712
  • [Wave] Add is_causal test and plumb quant_dtype into API. by @raikonenfnu in #711
  • [WAVE] Tanh approximation flashinfer implementation by @yichiche in #699
  • Added support for scalar codegen by @xintin in https:/...
Read more

Release v3.3.0

24 Mar 18:12
v3.3.0
eefa599
Compare
Choose a tag to compare

Release Notes

Kernels

  • Support for alibi bias, RPE in attention
  • Support for softcap/logit_cap in attention
  • Performance improvements to extend attention
  • Rewrite and Performance improvements to decode attention

Compiler & Runtime

  • Support for buffer ops
  • Shared memory optimizations
  • Unaligned attention support
  • Gather optimizations
  • Improvements to dlpack
  • Faster kernel launching using wave runtime

QoL

New contributors

Full changelog

List of changes

Read more

Release v3.2.0

10 Feb 19:56
v3.2.0
7038127
Compare
Choose a tag to compare

Release highlights

iree-turbine core

  • #434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the docs/ folder for contributing instructions.

  • #373 : The deprecated shark_turbine namespace has been fully deleted, users should migrate to iree.turbine.

  • #418: There are new utility APIs for preparing tensors as input arguments for IREE tools:

    # iree.turbine.support.conversions
    torch_dtyped_shape_to_iree_format(...)
    
    # iree.turbine.support.tools
    iree_tool_format_cli_input_arg(...)
    iree_tool_prepare_input_args(...)

TKW

Improved support and performance for attention kernel variants:

  • #387: Added a new paged decode attention kernel.
  • #412: Added a new implementation of prefill attention.
  • #452: Add self_index, predicate, and selectOp to implement causal attention.
  • #424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.

Other optimizations:

  • #394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
  • #436: Use buffer ops for masked load/stores.

Development quality of life features:

  • #406: Tests parameterized by shapes now have better names.
  • #423: Wave pass pipelines now feature printing options.

Changelog

Full list of changes: v3.1.0...v3.2.0

Release v3.1.0

08 Jan 21:56
3dbb4e5
Compare
Choose a tag to compare

Release highlights

iree-turbine core

The core of iree-turbine remains fairly stable, with a few quality of life and infrastructure changes this release cycle.

  • PyTorch is no longer a direct requirement that gets automatically installed alongside the iree-turbine package as of #343. Instead, users should install the torch package variant of their choice before importing from iree-turbine will succeed. For example:

    pip install torch --index-url https://download.pytorch.org/whl/test/cpu
    pip install iree-turbine
  • Pre-release versions of the iree-turbine package are now published nightly. Install with

    pip install \
      --find-links https://iree.dev/pip-release-links.html \
      --upgrade --pre \
      iree-turbine
    
  • Project tests are now running continuously across Python versions 3.10-3.12: #326.

  • A new turbine_generate rule is available for the iree.build API. This can be used to programmatically export and compile pipelines of programs using iree-turbine: #249.

TKW

Activity in the TKW (turbine/kernel/wave/) subproject has been accelerating.

Full changelog

v3.0.0...v3.1.0

dev-wheels

05 Dec 21:49
727137a
Compare
Choose a tag to compare

Automatic snapshot release of iree-turbine python wheels.

Release v3.0.0

18 Nov 19:19
583004f
Compare
Choose a tag to compare

Release v2.9.0

09 Nov 00:40
4a3f8c9
Compare
Choose a tag to compare

Release highlights

  • Bumped IREE to the latest stable release using new iree-base-compiler and iree-base-runtime package names and the matching version 2.9.0.

Full Changelog: v2.5.0...v2.9.0

Release v2.5.0

06 Nov 21:29
9b3fa06
Compare
Choose a tag to compare

Release highlights

  • Bumped IREE to the latest stable release: 20241104.1068. We expect the next release to use a new version scheme and new package names. See iree-org/iree#18938 for full details.
  • PyTorch versions 2.5+ are now supported thanks to 97e0517.
  • The migration from the shark-turbine package namespace to iree.turbine is structurally complete: 40016ad . Downstream projects should switch to the new namespace and iree-turbine package name.

Full Changelog: v2.3.0...v2.5.0