Releases: iree-org/iree-turbine
Release v3.6.0
IREE-Turbine Release Notes
The iree-turbine project has been migrated to the new Wave repository. More details here.
IREE-Wave
New Operators & Kernels
- Atomic min op lowering - supports int data types (#724)
- BitcastOp to represent or pack data from same bandwidth (#933)
- Implemented Cube root (CBRT) unary op from math dialect (#964)
- ScaledMMA Op (#815)
- Support Linear Layer BMM for MXFP4 (#996)
- More operators (#976)
Documentation
- Trace setup instructions (#1040)
Kernel Improvements
- Paged decode attention kernel (#934, #944, #982, #985, #1010, #1022, #1027, #1062)
- Speculative decode kernel (#1034)
- Symbol Separation for Decode Logic (#879): Split K2 into SPLIT_ITER (used for masking/reduction) and KV_LENS (actual tensor size); improved handling of KV_START_IDX for proper masking.
- Tile Masking for Sub-Vector Sizes (#906): Added support for tile_size < vector_size in WG/Wave tiles by masking out-of-bounds elements during shared memory access.
- Dynamic Dimension Support in Attention Kernel (#995): Made B, H, M, and K2 dynamic while keeping N static in bhsd_attention kernel for better flexibility.
- In-thread transpose (iree-org/wave#10)
Compiler Optimizations
- Scaled Indexing Support (#937): Compiler now handles "scaled" dimensions to support packed sub-byte matmuls (e.g., F4/I8 or F16/I32 formats).
- Buffer Access Correctness Fixes (#916): Replaced generic buffer ops with stride-aware vector loads/stores using fat_raw_buffer_cast.
- Indexing & Multibuffering Fixes (#936): Corrected unrolled op indexing and improved input/output mapping consistency in multi-buffered pipelines.
- Mask/Bounds Refactoring (#921): Moved bounds/masking logic to a dedicated pass for better maintainability (no functional change).
- Remove Redundant Subgraphs Post-Pipelining (#939): Cleaned up unused subgraphs left behind by reduction pipelining for cleaner traces.
- Generic Torch Backend for iree_turbine (#953): Added torch.compile backend integration for launching Launchable kernels via jit_compile.
General Improvements
- Added ability to export and import schedules (#847)
- Make cmake execute_process errors fatal (#958)
- Option to tweak schedules (#896)
- Search over schedules for a given kernel (#1019)
- Stack Tracing in Wave Kernels (#923): Enabled full stack trace capture during Wave op construction for better debugging, especially within PyTorch integration.
- ScanOp Enhancements (#947): Enabled local scan before global scan in all ScanOp cases; updated E2E tests.
- Proof-of-life Link to C++ Opt Tool (#961): Connected Wave kernels to external C++ tools for out-of-bounds access checks using dataflow analysis.
- Workgroup Reordering Constraint (#1000): Introduced ReorderingConstraint for customizable workgroup layouts; includes GEMM kernel template and tests.
- Bug fixes, cleanups and test coverage (#1009, #1028, #1017, #1045, #1002, #1052, #1057, #978, #1023, #962 #821)
Integration
- Wave kernel integration with sharktank (#787)
IREE-Turbine Core
- Introduced Graph-Based Fusion API (#967)
- Reduce CPU overhead from boo convolution launches (#900, #946)
- Layer normalization kernels support in boo (#979)
- Allow inlining IR for layout customizable convolutions (#1069)
- boo fixes and updates (#910, #950, #971, #992)
- Integrate boo fusion as a torch.compile backend (#1046)
- Enable make_single_dispatch specification in FusionSchema (#1079)
- Filtering based on functions provided to the fusion schema (#1080)
- Option tensor arguments for CustomOp (#1044)
- Replace aten.convolution with boo convolution (#1076)
- Provide default handling of channels-last formats for graph ops (#1054)
- Test coverage (#1064, #1063)
Change History
List of changes
What's Changed
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #907
- [BOO] Reduce CPU overhead by @zjgarvey in #900
- [Wave] Add atomic min op lowering by @nithinsubbiah in #724
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #911
- [Wave] Add aplp to gitignore by @tgymnich in #913
- Revert "[Wave] Add aplp to gitignore" by @Hardcode84 in #914
- [BOO] Add some boo_driver improvements by @zjgarvey in #910
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #918
- Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in #919
- [Wave] Add support for WG/Wave tiles smaller than vector sizes by @Hardcode84 in #906
- [TKW] Update decode by @Hardcode84 in #879
- [boo] Make
boo_driver
friendlier for shared machines by @rkayaith in #920 - Add stack tracing capability to Wave by @ftynse in #923
- [BOO] Remove xfail mark on passing test by @zjgarvey in #929
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #926
- [Wave] Mask/Bounds generation refactoring by @Hardcode84 in #921
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #935
- [Wave] Implement BitcastOp by @raikonenfnu in #933
- [boo] Even more tuning configurations by @rkayaith in #928
- Initial plumbing for Wave kernel integration with sharktank by @aviator19941 in #787
- [Wave] Add ability to export and import schedules by @harsh-nod in #847
- [Wave] Do not print IR on failure by default by @Hardcode84 in #924
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #938
- [Wave] Add more sglang shapes and remove divisibility check by @Hardcode84 in #925
- [Wave] Paged decode
output_dtype
support by @Hardcode84 in #934 - [Wave] Add missing Paged decode logit cap by @Hardcode84 in #944
- [Wave] Teach compiler to handle "scaled" dimensions/indexing by @raikonenfnu in #937
- [Wave] Add aplp to gitignore by @tgymnich in https://github.com/iree...
Release v3.5.0
Release notes
IREE-Turbine Core
- Enabled stream sharing between IREE and iree-turbine #883: Leveraged IREE's external HIP stream functionality to reduce launch latency when interoperating between PyTorch and IREE, improving performance for mixed-framework workflows.
- Introduced initial async execution support in the runtime #719: Added support for asynchronous task execution in the IREE runtime, enabling better parallelism and future integration with asynchronous frontends and runtimes.
IREE-Wave
New Operators & Kernels
- Added soft-sign kernel to replace tanh approximation kernel #829
- Added GEMM example without transpose B #881
Documentation
- Added documentation on several optimization passes #833, #835
- Added Jupyter notebook with GEMM example #784
- Added tutorial to implement fused softmax using wave #891
Kernel Improvements
- Various bug fixes and performance improvements for decode attention and speculative decoding kernels #838, #839, #854, #858, #878, #890, #888
Compiler Optimizations
- Improved GEMM performance with ping-pong optimizations #846, #832
- Better shared memory allocation management #822, #831
General Improvements
New Contributors
- @AWoloszyn made their first contribution in #719
- @fabianmcg made their first contribution in #838
- @SourishW made their first contribution in #866
- @badgerbroch made their first contribution in #867
- @bodhisaha made their first contribution in #869
- @efric made their first contribution in #881
- @Groverkss made their first contribution in #887
Full changelog
List of changes
- First pass of async execution in the runtime. by @AWoloszyn in #719
- [Wave] Add scaling to reference kernel in cache test by @raikonenfnu in #781
- [Wave] Implement block wide reduction by @raikonenfnu in #765
- [WAVE] Support wave-wise cumsum scan op::more than one elem per thread by @xintin in #753
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #782
- Enable benchmarking in performance ci by @harsh-nod in #164
- [TKW] Paged decode MHA using
GenericDot
by @Hardcode84 in #779 - [TKW] Update paged decode API by @Hardcode84 in #786
- [TKW] Silence expansion warning by @Hardcode84 in #788
- [Wave] Fix broken lit tests by @harsh-nod in #789
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #785
- [BOO] enable tuning specs for boo_convs by @zjgarvey in #783
- Bump version to 3.5.0 after 3.4.0 release. by @ScottTodd in #793
- [BOO] Add a timeout for out-of-process compilation by @zjgarvey in #796
- [Wave] add excalidraw visualization of wave compilation pipeline by @martin-luecke in #777
- [Wave] support for partial unrolling of iterate by @martin-luecke in #776
- [BOO] Include tuning specs in pip package by @zjgarvey in #800
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #799
- Bump actions/create-github-app-token from 2.0.2 to 2.0.6 in the github-actions group by @dependabot in #795
- Add support for additional type interop by @rsuderman in #803
- [TKW] Avoid cache folder race conditions when running tests in parallel by @Hardcode84 in #808
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #806
- [TKW] Paged decode: move scaling into the kernel by @Hardcode84 in #807
- [runtime] Enable async for
CustomOp
eager execution by @zjgarvey in #802 - [boo_driver] Align
-t
option name with miopen driver by @rkayaith in #810 - [Wave] Add jupyter notebook showing how to do a GEMM by @harsh-nod in #784
- [TKW] Paged Decode dynamic symbols by @Hardcode84 in #809
- [TKW] Tweak Paged Decode kernel waves/tile sizes by @Hardcode84 in #794
- [Wave] Add mlir style printing by @harsh-nod in #775
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #812
- [BOO] Fix a bug in
boo_driver.py
resulting from arg rename by @zjgarvey in #818 - Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #814
- [boo] Add more tuning configurations by @rkayaith in #819
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #823
- [BOO] remove
generate.py
script by @zjgarvey in #824 - Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #825
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #826
- [Wave] Minimize shared allocs by @harsh-nod in #822
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #830
- [BOO] Enable bad kernel launch fix + kwarg filtered replacement by @zjgarvey in #828
- [Wave] Add additional check when minimizing allocs by @harsh-nod in #831
- [BOO] Add autocast functionality to
boo_conv
by @zjgarvey in #827 - [Wave] Add scheduling/synchronization related ops by @raikonenfnu in #832
- [Wave] Add documentation by @harsh-nod in #833
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #834
- [Wave] Add install requirement for mermaid diagrams by @harsh-nod in #835
- [TKW] BF16 support in paged decode kernel by @Hardcode84 in #836
- [TKW] Option to print pass timings by @Hardcode84 in #813
- [runtime] Rename dltensor capsule to avoid memory leak by @zjgarvey in #843
- [Wave] Generalize Barriers to work across iterations by @raikonenfnu in #839
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #842
- [TKW] Improve wave runtime by @Hardcode84 in #844
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #848
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #849
- [Wave] Two PP cluster GEMM by @raikonenfnu in #846
- [Wave] Add ability to compare against flash_attn_func by @harsh-nod in #798
- [BOO] Enable
torch.compile
for models with boo convs...
Release v3.4.0
Release notes
New Kernels
- Speculative sampling kernel
- MHA kernel
- Added wave.nn.Linear and wave.nn.QuantLinear that can be drop-in replacements for nn.Linear
- Added wave_sdpa and wave_sdpa_fp8 that can be drop-in replacements for torch.sdpa
- Support for more attention variants
- Extend attention API modified to match flashinfer API
New language features
- Block reduce, Cumulative sum, Roundeven,
- tkw.iterate allows iterating till a condition is specified
- Generic Dot support
- Atomic Min support
- Support for scalar kernel arguments
- Elements per thread now optional
Compiler optimizations
- 2-stage prefetching for GEMMs
- Tanh approximation implemented
- Added support for fast math flags
- Multi-buffering support for GEMMs
Other notable changes
- Introduced wave_compile API for compiling kernels
- tkw.reduction renamed to tkw.iterate
- Bug fixes in kernel caching
- Added validation and error messages
- Added location tracking support
New Contributors
- @rkayaith made their first contribution in #636
- @Giuseppe5 made their first contribution in #679
- @yichiche made their first contribution in #699
- @jerryyin made their first contribution in #732
Full changelog
List of changes
- Added roundeven op by @xintin in #600
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #605
- [runtime] Tier out internal Launchable caching and get hal device from uuid by @zjgarvey in #610
- [Wave] Compute grid before kernel launch by @harsh-nod in #606
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #615
- Bump version to 3.4.0 after 3.3.0 release. by @ScottTodd in #616
- [BOO] Adds a function for getting a Launchable from a ConvSignature by @zjgarvey in #608
- [TKW] Keep disambiguating number in expanded node name by @GMNGeoffrey in #437
- [Wave] Introduce wave compile and partition utils by @harsh-nod in #612
- [TKW] Flash Attention 2 Backward Pass by @GMNGeoffrey in #553
- [Wave] Add 2-stage prefetch scheduling strategy for GEMM by @raikonenfnu in #614
- Bump the github-actions group with 2 updates by @dependabot in #617
- [Wave] temporarily turn off test for non default paths. by @raikonenfnu in #624
- [Wave] Implement Linear Layer for Wave by @raikonenfnu in #619
- Add a basic jinja custom op for
conv_2d_nhwc_fhwc
by @zjgarvey in #622 - [Wave] Update extend attention kernel signature by @nithinsubbiah in #621
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #618
- [BOO] enable using
conv_2d_nhwc_fhwc
in conv exports by @zjgarvey in #627 - [Wave] Build wave runtime using torch pip package by @harsh-nod in #625
- [Wave] Restore benchmarking code by @harsh-nod in #628
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #638
- [BOO] Add driver script for convolutions by @rkayaith in #636
- [BOO] Add a cache prepopulator by @zjgarvey in #642
- [Wave] Implement FP8 Quantized attention by @raikonenfnu in #637
- [Wave][Cache] Add caching support for nested functions by @raikonenfnu in #649
- [Wave] Add missing header by @harsh-nod in #645
- Update docs with new wave_compile API. by @harsh-nod in #646
- [TKW] Make elements-per-thread optional by @Hardcode84 in #650
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #660
- [TKW] Infer
elements_per_thread
for elementwise ops by @Hardcode84 in #658 - Inject cache directory into the cache manager by @GMNGeoffrey in #590
- [TKW] API and annotation fix by @martin-luecke in #664
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #663
- [Wave] Implement fused bias-add on Wave kernel Linear Layer by @raikonenfnu in #669
- [TKW] Disable ceildiv emulation by @Hardcode84 in #668
- [TKW] Propagate index from reduce nodes by @Hardcode84 in #644
- [Wave] Use reciprocal to compute softcap logits by @harsh-nod in #674
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #676
- Custom mask bshd attention variant by @xintin in #665
- [Wave] Add support for sliding window attention by @harsh-nod in #626
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #682
- [TKW] Add missing validation and error messages by @GMNGeoffrey in #432
- [TKW] Use affine apply by @Hardcode84 in #666
- Feat: better scaling for fp8 quant by @Giuseppe5 in #679
- [TKW] Skip slow tests by @Hardcode84 in #671
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #687
- [Wave] Add fast math option by @harsh-nod in #686
- [Wave] Add GQA/MQA Vanilla template by @harsh-nod in #688
- [boo_driver] Add
--csv
and--splat-input-value
options by @rkayaith in #684 - Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #689
- [Wave] Add quantized linear layer kernel by @nithinsubbiah in #681
- [Wave] Initial support for Multibuffering by @martin-luecke in #633
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #694
- Bump the github-actions group with 2 updates by @dependabot in #662
- [BOO] add an autograd function for conv launchables by @zjgarvey in #670
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #698
- [Wave] Fix issues in bshd gqa kernel by @harsh-nod in #692
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #700
- [BOO] Adds generic convolution custom ops for BOO convs by @zjgarvey in #701
- [BOO] adds a non-jit-compile
Launchable
and a numerics tool for convs by @zjgarvey in #696 - [Wave] Add custom causal mask for Extend Attention by @nithinsubbiah in #639
- [BOO] temporarily xfail bias grad-check test by @zjgarvey in #705
- [BOO] Small fixes and a README update by @zjgarvey in #707
- [Wave] Improve WaveCacher to handle changing freevar in nested fn. by @raikonenfnu in #706
- Capture location information from Python by @ftynse in #675
- [Wave] Implement efficient and user friendly FP8 SDPA Op by @raikonenfnu in #708
- [Wave] Refactor layers to nn directory to mirror torch. by @raikonenfnu in #710
- [Wave] Add gqa decode kernel by @harsh-nod in #703
- [Wave] Turn on wave runtime for extend attention test by @nithinsubbiah in #712
- [Wave] Add is_causal test and plumb quant_dtype into API. by @raikonenfnu in #711
- [WAVE] Tanh approximation flashinfer implementation by @yichiche in #699
- Added support for scalar codegen by @xintin in https:/...
Release v3.3.0
Release Notes
Kernels
- Support for alibi bias, RPE in attention
- Support for softcap/logit_cap in attention
- Performance improvements to extend attention
- Rewrite and Performance improvements to decode attention
Compiler & Runtime
- Support for buffer ops
- Shared memory optimizations
- Unaligned attention support
- Gather optimizations
- Improvements to dlpack
- Faster kernel launching using wave runtime
QoL
- New documentation: https://iree-turbine.readthedocs.io/en/latest/kernel/runtime.html
- Improvements to testing infrastructure
- Several bug fixes
New contributors
- @hubertlu-tw made their first contribution in #474
- @ftynse made their first contribution in #467
- @nicolasvasilache made their first contribution in #497
- @Eliasj42 made their first contribution in #564
- @newling made their first contribution in #577
- @nithinsubbiah made their first contribution in #582
Full changelog
List of changes
- Include numbered version for release candidates by @sogartar in #458
- [Wave] Refactor attention lit tests by @harsh-nod in #459
- [TKW] Use buffer ops for masked load/stores by @Hardcode84 in #436
- [Wave] Add causal option to extend attention by @harsh-nod in #461
- [TKW] Update extend attention tile size by @Hardcode84 in #462
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #460
- [Wave] Make extend sequence dimension dynamic by @harsh-nod in #457
- [TKW] Add pass to merge shared mem allocs by @Hardcode84 in #468
- Add functions that prepare tensors as input args for IREE tools by @sogartar in #418
- [Wave] Teach compiler to handle unaligned attention by @raikonenfnu in #465
- [Wave] Add softcap/logit cap into extend_attention by @raikonenfnu in #469
- [Wave] Add physical layout to cache key by @harsh-nod in #472
- Revert "Include numbered version for release candidates" by @ScottTodd in #471
- Fixes for SGLang E2E testing by @hubertlu-tw in #474
- Attention with Linear Biases by @ftynse in #467
- [Wave] Get runperf working for extend attention by @harsh-nod in #470
- Revert "Attention with Linear Biases" by @harsh-nod in #477
- [Wave] Add free variable as key to WaveCacher by @raikonenfnu in #485
- [Wave] Fixes to extend attention by @harsh-nod in #480
- [Wave] Make num_seq to dynamic for Extend Attention by @raikonenfnu in #476
- [TKW] Reuse allocs fix by @Hardcode84 in #483
- Bump version to 3.3.0 after releasing 3.2.0. by @ScottTodd in #489
- [Wave] Make maximum extend sequence length dynamic by @harsh-nod in #490
- [Wave] Get extend working with more MFMA intrinsics by @harsh-nod in #491
- Reland Attention with Linear Biases by @ftynse in #486
- [Wave] Teach compiler to handle batched MMA for VMFMAOps by @raikonenfnu in #495
- Add RPE variant by @nicolasvasilache in #497
- [Wave] Refactor style and fix test for RPE by @raikonenfnu in #499
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #493
- Fix doc for functions that prepare IREE tools' input args by @sogartar in #501
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #505
- [TKW] RPE fixes by @Hardcode84 in #506
- [TKW] Work on buffer ops by @Hardcode84 in #492
- Added readme to the aot_mlp example by @xintin in #266
- Global to shared gathers by @harsh-nod in #496
- [TKW] Improve
gen_sympy_index
broadcasting by @Hardcode84 in #508 - Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #509
- Bump the github-actions group across 1 directory with 2 updates by @dependabot in #512
- [TKW] Fix RPE 32x32x8 MMA by @Hardcode84 in #511
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #513
- [Wave] minimize global load for dynamics by @raikonenfnu in #515
- [TKW] Add MinOp as an Op for Quantized LLM and GenAI workload by @NoumanAmir657 in #507
- Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #503
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #516
- [Wave] Remove artifact by @raikonenfnu in #518
- [Wave] Remove polynomial approx with iree compile flag by @raikonenfnu in #519
- [TKW] Fix memory leak by @Hardcode84 in #522
- [Wave] protect skipping polynomial approx behind a flag by @raikonenfnu in #523
- [Wave] Various extend attention optimization by @raikonenfnu in #524
- [TKW] Less conservative partition by @Hardcode84 in #525
- Extend attention + RPE by @nicolasvasilache in #504
- Bump the github-actions group with 3 updates by @dependabot in #527
- [TKW] Use buffer ops for splatted masked load/stores by @Hardcode84 in #526
- Use LRU cache when retrieving kernel hash by @harsh-nod in #528
- [TKW] Extend attention RPE
F32_32x32x8_F16
by @Hardcode84 in #530 - Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #533
- Improvements to caching by @harsh-nod in #531
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #535
- [NFC] Refactor to allow more composable usage of LaunchableWave towar… by @nicolasvasilache in #532
- NFC - Add type to custom_string implementation by @nicolasvasilache in #536
- [TKW] Allow to override MLIR code, for debugging by @Hardcode84 in #534
- [TKW] Unroll gathers/scatters by @Hardcode84 in #541
- [NFC] Move WaveConstraint related convention into the WaveConstraint … by @nicolasvasilache in #537
- Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #543
- Remove more windows newlines by @GMNGeoffrey in #544
- [TKW] Warn if IREE versions is too low by @Hardcode84 in #549
- Bump actions/create-github-app-token from 1.11.5 to 1.11.6 in the github-actions group by @dependabot in #552
- [NFC] Better HardwareConstraint.apply APIs by @nicolasvasilache in #538
- Replace deprecated assert_allclose by @GMNGeoffrey in #556
- [Wave] Add ability to capture traces while benchmarking by @harsh-nod in #550
- [TKW] Fix device in extend attention test by @Hardcode84 in #558
- Skip failing
tests/dynamo/tensor_test.py::TensorTest::test_nn_ML
. by @ScottTodd in #561 - [TKW] Ignore
_warn_iree_is_too_old
errors by @Hardcode84 in #562 - Fix several typo-type bugs in the interpreter tool by @GMNGeoffrey in #545
- [TKW] Fix method signature for reshape op by @GMNGeoffrey in...
Release v3.2.0
Release highlights
iree-turbine core
-
#434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the
docs/
folder for contributing instructions. -
#373 : The deprecated
shark_turbine
namespace has been fully deleted, users should migrate toiree.turbine
. -
#418: There are new utility APIs for preparing tensors as input arguments for IREE tools:
# iree.turbine.support.conversions torch_dtyped_shape_to_iree_format(...) # iree.turbine.support.tools iree_tool_format_cli_input_arg(...) iree_tool_prepare_input_args(...)
TKW
Improved support and performance for attention kernel variants:
- #387: Added a new paged decode attention kernel.
- #412: Added a new implementation of prefill attention.
- #452: Add self_index, predicate, and selectOp to implement causal attention.
- #424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.
Other optimizations:
- #394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
- #436: Use buffer ops for masked load/stores.
Development quality of life features:
- #406: Tests parameterized by shapes now have better names.
- #423: Wave pass pipelines now feature printing options.
Changelog
Full list of changes: v3.1.0...v3.2.0
Release v3.1.0
Release highlights
iree-turbine core
The core of iree-turbine
remains fairly stable, with a few quality of life and infrastructure changes this release cycle.
-
PyTorch is no longer a direct requirement that gets automatically installed alongside the
iree-turbine
package as of #343. Instead, users should install thetorch
package variant of their choice before importing fromiree-turbine
will succeed. For example:pip install torch --index-url https://download.pytorch.org/whl/test/cpu pip install iree-turbine
-
Pre-release versions of the
iree-turbine
package are now published nightly. Install withpip install \ --find-links https://iree.dev/pip-release-links.html \ --upgrade --pre \ iree-turbine
-
Project tests are now running continuously across Python versions 3.10-3.12: #326.
-
A new
turbine_generate
rule is available for theiree.build
API. This can be used to programmatically export and compile pipelines of programs usingiree-turbine
: #249.
TKW
Activity in the TKW (turbine/kernel/wave/
) subproject has been accelerating.
- Attention decode kernels are in development: #308, #340.
- TKW now supports a compiled kernel cache that helps with both online and offline uses: #329, #351.
- An evoformer example was added at
iree/turbine/kernel/wave/templates/evoformer.py
in #307. - A number of optimizations have been added across the project: #291, #296, #292, #304, #318, #335, and more.
Full changelog
dev-wheels
Automatic snapshot release of iree-turbine python wheels.
Release v3.0.0
Full changelog: v2.9.0...v3.0.0
Release v2.9.0
Release highlights
- Bumped IREE to the latest stable release using new
iree-base-compiler
andiree-base-runtime
package names and the matching version2.9.0
.
Full Changelog: v2.5.0...v2.9.0
Release v2.5.0
Release highlights
- Bumped IREE to the latest stable release:
20241104.1068
. We expect the next release to use a new version scheme and new package names. See iree-org/iree#18938 for full details. - PyTorch versions 2.5+ are now supported thanks to 97e0517.
- The migration from the
shark-turbine
package namespace toiree.turbine
is structurally complete: 40016ad . Downstream projects should switch to the new namespace andiree-turbine
package name.
Full Changelog: v2.3.0...v2.5.0