21 Jul 22:13

sa-faizal

356fa39

Release v3.6.0 Latest

Latest

IREE-Turbine Release Notes

The iree-turbine project has been migrated to the new Wave repository. More details here.

IREE-Wave

New Operators & Kernels

Atomic min op lowering - supports int data types (#724)
BitcastOp to represent or pack data from same bandwidth (#933)
Implemented Cube root (CBRT) unary op from math dialect (#964)
ScaledMMA Op (#815)
Support Linear Layer BMM for MXFP4 (#996)
More operators (#976)

Documentation

Trace setup instructions (#1040)

Kernel Improvements

Paged decode attention kernel (#934, #944, #982, #985, #1010, #1022, #1027, #1062)
Speculative decode kernel (#1034)
Symbol Separation for Decode Logic (#879): Split K2 into SPLIT_ITER (used for masking/reduction) and KV_LENS (actual tensor size); improved handling of KV_START_IDX for proper masking.
Tile Masking for Sub-Vector Sizes (#906): Added support for tile_size < vector_size in WG/Wave tiles by masking out-of-bounds elements during shared memory access.
Dynamic Dimension Support in Attention Kernel (#995): Made B, H, M, and K2 dynamic while keeping N static in bhsd_attention kernel for better flexibility.
In-thread transpose (iree-org/wave#10)

Compiler Optimizations

Scaled Indexing Support (#937): Compiler now handles "scaled" dimensions to support packed sub-byte matmuls (e.g., F4/I8 or F16/I32 formats).
Buffer Access Correctness Fixes (#916): Replaced generic buffer ops with stride-aware vector loads/stores using fat_raw_buffer_cast.
Indexing & Multibuffering Fixes (#936): Corrected unrolled op indexing and improved input/output mapping consistency in multi-buffered pipelines.
Mask/Bounds Refactoring (#921): Moved bounds/masking logic to a dedicated pass for better maintainability (no functional change).
Remove Redundant Subgraphs Post-Pipelining (#939): Cleaned up unused subgraphs left behind by reduction pipelining for cleaner traces.
Generic Torch Backend for iree_turbine (#953): Added torch.compile backend integration for launching Launchable kernels via jit_compile.

General Improvements

Added ability to export and import schedules (#847)
Make cmake execute_process errors fatal (#958)
Option to tweak schedules (#896)
Search over schedules for a given kernel (#1019)
Stack Tracing in Wave Kernels (#923): Enabled full stack trace capture during Wave op construction for better debugging, especially within PyTorch integration.
ScanOp Enhancements (#947): Enabled local scan before global scan in all ScanOp cases; updated E2E tests.
Proof-of-life Link to C++ Opt Tool (#961): Connected Wave kernels to external C++ tools for out-of-bounds access checks using dataflow analysis.
Workgroup Reordering Constraint (#1000): Introduced ReorderingConstraint for customizable workgroup layouts; includes GEMM kernel template and tests.
Bug fixes, cleanups and test coverage (#1009, #1028, #1017, #1045, #1002, #1052, #1057, #978, #1023, #962 #821)

Integration

Wave kernel integration with sharktank (#787)

IREE-Turbine Core

Introduced Graph-Based Fusion API (#967)
Reduce CPU overhead from boo convolution launches (#900, #946)
Layer normalization kernels support in boo (#979)
Allow inlining IR for layout customizable convolutions (#1069)
boo fixes and updates (#910, #950, #971, #992)
Integrate boo fusion as a torch.compile backend (#1046)
Enable make_single_dispatch specification in FusionSchema (#1079)
Filtering based on functions provided to the fusion schema (#1080)
Option tensor arguments for CustomOp (#1044)
Replace aten.convolution with boo convolution (#1076)
Provide default handling of channels-last formats for graph ops (#1054)
Test coverage (#1064, #1063)

Change History

List of changes

What's Changed

Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #907
[BOO] Reduce CPU overhead by @zjgarvey in #900
[Wave] Add atomic min op lowering by @nithinsubbiah in #724
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #911
[Wave] Add aplp to gitignore by @tgymnich in #913
Revert "[Wave] Add aplp to gitignore" by @Hardcode84 in #914
[BOO] Add some boo_driver improvements by @zjgarvey in #910
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #918
Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in #919
[Wave] Add support for WG/Wave tiles smaller than vector sizes by @Hardcode84 in #906
[TKW] Update decode by @Hardcode84 in #879
[boo] Make boo_driver friendlier for shared machines by @rkayaith in #920
Add stack tracing capability to Wave by @ftynse in #923
[BOO] Remove xfail mark on passing test by @zjgarvey in #929
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #926
[Wave] Mask/Bounds generation refactoring by @Hardcode84 in #921
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #935
[Wave] Implement BitcastOp by @raikonenfnu in #933
[boo] Even more tuning configurations by @rkayaith in #928
Initial plumbing for Wave kernel integration with sharktank by @aviator19941 in #787
[Wave] Add ability to export and import schedules by @harsh-nod in #847
[Wave] Do not print IR on failure by default by @Hardcode84 in #924
Bump IREE requirement pins to their latest versions. by @iree-pr-automator[bot] in #938
[Wave] Add more sglang shapes and remove divisibility check by @Hardcode84 in #925
[Wave] Paged decode output_dtype support by @Hardcode84 in #934
[Wave] Add missing Paged decode logit cap by @Hardcode84 in #944
[Wave] Teach compiler to handle "scaled" dimensions/indexing by @raikonenfnu in #937
[Wave] Add aplp to gitignore by @tgymnich in https://github.com/iree...

Contributors

ashay, rkayaith, and 18 other contributors

Assets 2

11 Jun 15:33

ScottTodd

v3.5.0

b99b395

Release v3.5.0

Release notes

IREE-Turbine Core

Enabled stream sharing between IREE and iree-turbine #883: Leveraged IREE's external HIP stream functionality to reduce launch latency when interoperating between PyTorch and IREE, improving performance for mixed-framework workflows.
Introduced initial async execution support in the runtime #719: Added support for asynchronous task execution in the IREE runtime, enabling better parallelism and future integration with asynchronous frontends and runtimes.

IREE-Wave

New Operators & Kernels

Added soft-sign kernel to replace tanh approximation kernel #829
Added GEMM example without transpose B #881

Documentation

Added documentation on several optimization passes #833, #835
Added Jupyter notebook with GEMM example #784
Added tutorial to implement fused softmax using wave #891

Kernel Improvements

Various bug fixes and performance improvements for decode attention and speculative decoding kernels #838, #839, #854, #858, #878, #890, #888

Compiler Optimizations

Improved GEMM performance with ping-pong optimizations #846, #832
Better shared memory allocation management #822, #831

General Improvements

Options to print compiler pass timings #813
Ability to print fx.graphs in mlir style #775

New Contributors

@AWoloszyn made their first contribution in #719
@fabianmcg made their first contribution in #838
@SourishW made their first contribution in #866
@badgerbroch made their first contribution in #867
@bodhisaha made their first contribution in #869
@efric made their first contribution in #881
@Groverkss made their first contribution in #887

Full changelog

List of changes

First pass of async execution in the runtime. by @AWoloszyn in #719
[Wave] Add scaling to reference kernel in cache test by @raikonenfnu in #781
[Wave] Implement block wide reduction by @raikonenfnu in #765
[WAVE] Support wave-wise cumsum scan op::more than one elem per thread by @xintin in #753
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #782
Enable benchmarking in performance ci by @harsh-nod in #164
[TKW] Paged decode MHA using GenericDot by @Hardcode84 in #779
[TKW] Update paged decode API by @Hardcode84 in #786
[TKW] Silence expansion warning by @Hardcode84 in #788
[Wave] Fix broken lit tests by @harsh-nod in #789
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #785
[BOO] enable tuning specs for boo_convs by @zjgarvey in #783
Bump version to 3.5.0 after 3.4.0 release. by @ScottTodd in #793
[BOO] Add a timeout for out-of-process compilation by @zjgarvey in #796
[Wave] add excalidraw visualization of wave compilation pipeline by @martin-luecke in #777
[Wave] support for partial unrolling of iterate by @martin-luecke in #776
[BOO] Include tuning specs in pip package by @zjgarvey in #800
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #799
Bump actions/create-github-app-token from 2.0.2 to 2.0.6 in the github-actions group by @dependabot in #795
Add support for additional type interop by @rsuderman in #803
[TKW] Avoid cache folder race conditions when running tests in parallel by @Hardcode84 in #808
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #806
[TKW] Paged decode: move scaling into the kernel by @Hardcode84 in #807
[runtime] Enable async for CustomOp eager execution by @zjgarvey in #802
[boo_driver] Align -t option name with miopen driver by @rkayaith in #810
[Wave] Add jupyter notebook showing how to do a GEMM by @harsh-nod in #784
[TKW] Paged Decode dynamic symbols by @Hardcode84 in #809
[TKW] Tweak Paged Decode kernel waves/tile sizes by @Hardcode84 in #794
[Wave] Add mlir style printing by @harsh-nod in #775
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #812
[BOO] Fix a bug in boo_driver.py resulting from arg rename by @zjgarvey in #818
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #814
[boo] Add more tuning configurations by @rkayaith in #819
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #823
[BOO] remove generate.py script by @zjgarvey in #824
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #825
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #826
[Wave] Minimize shared allocs by @harsh-nod in #822
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #830
[BOO] Enable bad kernel launch fix + kwarg filtered replacement by @zjgarvey in #828
[Wave] Add additional check when minimizing allocs by @harsh-nod in #831
[BOO] Add autocast functionality to boo_conv by @zjgarvey in #827
[Wave] Add scheduling/synchronization related ops by @raikonenfnu in #832
[Wave] Add documentation by @harsh-nod in #833
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #834
[Wave] Add install requirement for mermaid diagrams by @harsh-nod in #835
[TKW] BF16 support in paged decode kernel by @Hardcode84 in #836
[TKW] Option to print pass timings by @Hardcode84 in #813
[runtime] Rename dltensor capsule to avoid memory leak by @zjgarvey in #843
[Wave] Generalize Barriers to work across iterations by @raikonenfnu in #839
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #842
[TKW] Improve wave runtime by @Hardcode84 in #844
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #848
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #849
[Wave] Two PP cluster GEMM by @raikonenfnu in #846
[Wave] Add ability to compare against flash_attn_func by @harsh-nod in #798
[BOO] Enable torch.compile for models with boo convs...

Contributors

rkayaith, harsh-nod, and 18 other contributors

Assets 3

05 May 16:51

ScottTodd

v3.4.0

83b2de2

Release v3.4.0

Release notes

New Kernels

Speculative sampling kernel
MHA kernel
Added wave.nn.Linear and wave.nn.QuantLinear that can be drop-in replacements for nn.Linear
Added wave_sdpa and wave_sdpa_fp8 that can be drop-in replacements for torch.sdpa
Support for more attention variants
Extend attention API modified to match flashinfer API

New language features

Block reduce, Cumulative sum, Roundeven,
tkw.iterate allows iterating till a condition is specified
Generic Dot support
Atomic Min support
Support for scalar kernel arguments
Elements per thread now optional

Compiler optimizations

2-stage prefetching for GEMMs
Tanh approximation implemented
Added support for fast math flags
Multi-buffering support for GEMMs

Other notable changes

Introduced wave_compile API for compiling kernels
tkw.reduction renamed to tkw.iterate
Bug fixes in kernel caching
Added validation and error messages
Added location tracking support

New Contributors

@rkayaith made their first contribution in #636
@Giuseppe5 made their first contribution in #679
@yichiche made their first contribution in #699
@jerryyin made their first contribution in #732

Full changelog

List of changes

Added roundeven op by @xintin in #600
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #605
[runtime] Tier out internal Launchable caching and get hal device from uuid by @zjgarvey in #610
[Wave] Compute grid before kernel launch by @harsh-nod in #606
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #615
Bump version to 3.4.0 after 3.3.0 release. by @ScottTodd in #616
[BOO] Adds a function for getting a Launchable from a ConvSignature by @zjgarvey in #608
[TKW] Keep disambiguating number in expanded node name by @GMNGeoffrey in #437
[Wave] Introduce wave compile and partition utils by @harsh-nod in #612
[TKW] Flash Attention 2 Backward Pass by @GMNGeoffrey in #553
[Wave] Add 2-stage prefetch scheduling strategy for GEMM by @raikonenfnu in #614
Bump the github-actions group with 2 updates by @dependabot in #617
[Wave] temporarily turn off test for non default paths. by @raikonenfnu in #624
[Wave] Implement Linear Layer for Wave by @raikonenfnu in #619
Add a basic jinja custom op for conv_2d_nhwc_fhwc by @zjgarvey in #622
[Wave] Update extend attention kernel signature by @nithinsubbiah in #621
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #618
[BOO] enable using conv_2d_nhwc_fhwc in conv exports by @zjgarvey in #627
[Wave] Build wave runtime using torch pip package by @harsh-nod in #625
[Wave] Restore benchmarking code by @harsh-nod in #628
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #638
[BOO] Add driver script for convolutions by @rkayaith in #636
[BOO] Add a cache prepopulator by @zjgarvey in #642
[Wave] Implement FP8 Quantized attention by @raikonenfnu in #637
[Wave][Cache] Add caching support for nested functions by @raikonenfnu in #649
[Wave] Add missing header by @harsh-nod in #645
Update docs with new wave_compile API. by @harsh-nod in #646
[TKW] Make elements-per-thread optional by @Hardcode84 in #650
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #660
[TKW] Infer elements_per_thread for elementwise ops by @Hardcode84 in #658
Inject cache directory into the cache manager by @GMNGeoffrey in #590
[TKW] API and annotation fix by @martin-luecke in #664
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #663
[Wave] Implement fused bias-add on Wave kernel Linear Layer by @raikonenfnu in #669
[TKW] Disable ceildiv emulation by @Hardcode84 in #668
[TKW] Propagate index from reduce nodes by @Hardcode84 in #644
[Wave] Use reciprocal to compute softcap logits by @harsh-nod in #674
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #676
Custom mask bshd attention variant by @xintin in #665
[Wave] Add support for sliding window attention by @harsh-nod in #626
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #682
[TKW] Add missing validation and error messages by @GMNGeoffrey in #432
[TKW] Use affine apply by @Hardcode84 in #666
Feat: better scaling for fp8 quant by @Giuseppe5 in #679
[TKW] Skip slow tests by @Hardcode84 in #671
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #687
[Wave] Add fast math option by @harsh-nod in #686
[Wave] Add GQA/MQA Vanilla template by @harsh-nod in #688
[boo_driver] Add --csv and --splat-input-value options by @rkayaith in #684
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #689
[Wave] Add quantized linear layer kernel by @nithinsubbiah in #681
[Wave] Initial support for Multibuffering by @martin-luecke in #633
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #694
Bump the github-actions group with 2 updates by @dependabot in #662
[BOO] add an autograd function for conv launchables by @zjgarvey in #670
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #698
[Wave] Fix issues in bshd gqa kernel by @harsh-nod in #692
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #700
[BOO] Adds generic convolution custom ops for BOO convs by @zjgarvey in #701
[BOO] adds a non-jit-compile Launchable and a numerics tool for convs by @zjgarvey in #696
[Wave] Add custom causal mask for Extend Attention by @nithinsubbiah in #639
[BOO] temporarily xfail bias grad-check test by @zjgarvey in #705
[BOO] Small fixes and a README update by @zjgarvey in #707
[Wave] Improve WaveCacher to handle changing freevar in nested fn. by @raikonenfnu in #706
Capture location information from Python by @ftynse in #675
[Wave] Implement efficient and user friendly FP8 SDPA Op by @raikonenfnu in #708
[Wave] Refactor layers to nn directory to mirror torch. by @raikonenfnu in #710
[Wave] Add gqa decode kernel by @harsh-nod in #703
[Wave] Turn on wave runtime for extend attention test by @nithinsubbiah in #712
[Wave] Add is_causal test and plumb quant_dtype into API. by @raikonenfnu in #711
[WAVE] Tanh approximation flashinfer implementation by @yichiche in #699
Added support for scalar codegen by @xintin in https:/...

Contributors

rkayaith, ftynse, and 14 other contributors

Assets 3

24 Mar 18:12

ScottTodd

v3.3.0

eefa599

Release v3.3.0

Release Notes

Kernels

Support for alibi bias, RPE in attention
Support for softcap/logit_cap in attention
Performance improvements to extend attention
Rewrite and Performance improvements to decode attention

Compiler & Runtime

Support for buffer ops
Shared memory optimizations
Unaligned attention support
Gather optimizations
Improvements to dlpack
Faster kernel launching using wave runtime

QoL

New documentation: https://iree-turbine.readthedocs.io/en/latest/kernel/runtime.html
Improvements to testing infrastructure
Several bug fixes

New contributors

@hubertlu-tw made their first contribution in #474
@ftynse made their first contribution in #467
@nicolasvasilache made their first contribution in #497
@Eliasj42 made their first contribution in #564
@newling made their first contribution in #577
@nithinsubbiah made their first contribution in #582

Full changelog

List of changes

Include numbered version for release candidates by @sogartar in #458
[Wave] Refactor attention lit tests by @harsh-nod in #459
[TKW] Use buffer ops for masked load/stores by @Hardcode84 in #436
[Wave] Add causal option to extend attention by @harsh-nod in #461
[TKW] Update extend attention tile size by @Hardcode84 in #462
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #460
[Wave] Make extend sequence dimension dynamic by @harsh-nod in #457
[TKW] Add pass to merge shared mem allocs by @Hardcode84 in #468
Add functions that prepare tensors as input args for IREE tools by @sogartar in #418
[Wave] Teach compiler to handle unaligned attention by @raikonenfnu in #465
[Wave] Add softcap/logit cap into extend_attention by @raikonenfnu in #469
[Wave] Add physical layout to cache key by @harsh-nod in #472
Revert "Include numbered version for release candidates" by @ScottTodd in #471
Fixes for SGLang E2E testing by @hubertlu-tw in #474
Attention with Linear Biases by @ftynse in #467
[Wave] Get runperf working for extend attention by @harsh-nod in #470
Revert "Attention with Linear Biases" by @harsh-nod in #477
[Wave] Add free variable as key to WaveCacher by @raikonenfnu in #485
[Wave] Fixes to extend attention by @harsh-nod in #480
[Wave] Make num_seq to dynamic for Extend Attention by @raikonenfnu in #476
[TKW] Reuse allocs fix by @Hardcode84 in #483
Bump version to 3.3.0 after releasing 3.2.0. by @ScottTodd in #489
[Wave] Make maximum extend sequence length dynamic by @harsh-nod in #490
[Wave] Get extend working with more MFMA intrinsics by @harsh-nod in #491
Reland Attention with Linear Biases by @ftynse in #486
[Wave] Teach compiler to handle batched MMA for VMFMAOps by @raikonenfnu in #495
Add RPE variant by @nicolasvasilache in #497
[Wave] Refactor style and fix test for RPE by @raikonenfnu in #499
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #493
Fix doc for functions that prepare IREE tools' input args by @sogartar in #501
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #505
[TKW] RPE fixes by @Hardcode84 in #506
[TKW] Work on buffer ops by @Hardcode84 in #492
Added readme to the aot_mlp example by @xintin in #266
Global to shared gathers by @harsh-nod in #496
[TKW] Improve gen_sympy_index broadcasting by @Hardcode84 in #508
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #509
Bump the github-actions group across 1 directory with 2 updates by @dependabot in #512
[TKW] Fix RPE 32x32x8 MMA by @Hardcode84 in #511
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #513
[Wave] minimize global load for dynamics by @raikonenfnu in #515
[TKW] Add MinOp as an Op for Quantized LLM and GenAI workload by @NoumanAmir657 in #507
Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #503
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #516
[Wave] Remove artifact by @raikonenfnu in #518
[Wave] Remove polynomial approx with iree compile flag by @raikonenfnu in #519
[TKW] Fix memory leak by @Hardcode84 in #522
[Wave] protect skipping polynomial approx behind a flag by @raikonenfnu in #523
[Wave] Various extend attention optimization by @raikonenfnu in #524
[TKW] Less conservative partition by @Hardcode84 in #525
Extend attention + RPE by @nicolasvasilache in #504
Bump the github-actions group with 3 updates by @dependabot in #527
[TKW] Use buffer ops for splatted masked load/stores by @Hardcode84 in #526
Use LRU cache when retrieving kernel hash by @harsh-nod in #528
[TKW] Extend attention RPE F32_32x32x8_F16 by @Hardcode84 in #530
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #533
Improvements to caching by @harsh-nod in #531
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #535
[NFC] Refactor to allow more composable usage of LaunchableWave towar… by @nicolasvasilache in #532
NFC - Add type to custom_string implementation by @nicolasvasilache in #536
[TKW] Allow to override MLIR code, for debugging by @Hardcode84 in #534
[TKW] Unroll gathers/scatters by @Hardcode84 in #541
[NFC] Move WaveConstraint related convention into the WaveConstraint … by @nicolasvasilache in #537
Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #543
Remove more windows newlines by @GMNGeoffrey in #544
[TKW] Warn if IREE versions is too low by @Hardcode84 in #549
Bump actions/create-github-app-token from 1.11.5 to 1.11.6 in the github-actions group by @dependabot in #552
[NFC] Better HardwareConstraint.apply APIs by @nicolasvasilache in #538
Replace deprecated assert_allclose by @GMNGeoffrey in #556
[Wave] Add ability to capture traces while benchmarking by @harsh-nod in #550
[TKW] Fix device in extend attention test by @Hardcode84 in #558
Skip failing tests/dynamo/tensor_test.py::TensorTest::test_nn_ML. by @ScottTodd in #561
[TKW] Ignore _warn_iree_is_too_old errors by @Hardcode84 in #562
Fix several typo-type bugs in the interpreter tool by @GMNGeoffrey in #545
[TKW] Fix method signature for reshape op by @GMNGeoffrey in...

Contributors

ftynse, sogartar, and 14 other contributors

Assets 3

10 Feb 19:56

ScottTodd

v3.2.0

7038127

Release v3.2.0

Release highlights

iree-turbine core

#434 : iree-turbine has a new website: https://iree-turbine.readthedocs.io/. See the docs/ folder for contributing instructions.
#373 : The deprecated shark_turbine namespace has been fully deleted, users should migrate to iree.turbine.

#418: There are new utility APIs for preparing tensors as input arguments for IREE tools:

# iree.turbine.support.conversions
torch_dtyped_shape_to_iree_format(...)

# iree.turbine.support.tools
iree_tool_format_cli_input_arg(...)
iree_tool_prepare_input_args(...)

TKW

Improved support and performance for attention kernel variants:

#387: Added a new paged decode attention kernel.
#412: Added a new implementation of prefill attention.
#452: Add self_index, predicate, and selectOp to implement causal attention.
#424: Reordering of shared load-write to minimize LDS barriers improved performance for some attention shapes by up to 10%.

Other optimizations:

#394: A memory layout attribute for the MemoryType now allows users to specify a physical shape that differs from the logical shape. This is useful in scenarios like kv-caches where certain dimensions physically are quite large but map to fixed logical dimensions.
#436: Use buffer ops for masked load/stores.

Development quality of life features:

#406: Tests parameterized by shapes now have better names.
#423: Wave pass pipelines now feature printing options.

Changelog

Full list of changes: v3.1.0...v3.2.0

Assets 3

08 Jan 21:56

ScottTodd

v3.1.0

3dbb4e5

Release v3.1.0

Release highlights

iree-turbine core

The core of iree-turbine remains fairly stable, with a few quality of life and infrastructure changes this release cycle.

PyTorch is no longer a direct requirement that gets automatically installed alongside the iree-turbine package as of #343. Instead, users should install the torch package variant of their choice before importing from iree-turbine will succeed. For example:
```
pip install torch --index-url https://download.pytorch.org/whl/test/cpu
pip install iree-turbine
```

Pre-release versions of the iree-turbine package are now published nightly. Install with

pip install \
  --find-links https://iree.dev/pip-release-links.html \
  --upgrade --pre \
  iree-turbine

Project tests are now running continuously across Python versions 3.10-3.12: #326.
A new turbine_generate rule is available for the iree.build API. This can be used to programmatically export and compile pipelines of programs using iree-turbine: #249.

TKW

Activity in the TKW (turbine/kernel/wave/) subproject has been accelerating.

Attention decode kernels are in development: #308, #340.
- Support has also been added to target MMA and MFMA intrinsics, workings towards efficient GEMM and attention kernels using lower precision data types: #279, #269, #287.
TKW now supports a compiled kernel cache that helps with both online and offline uses: #329, #351.
An evoformer example was added at iree/turbine/kernel/wave/templates/evoformer.py in #307.
A number of optimizations have been added across the project: #291, #296, #292, #304, #318, #335, and more.

Full changelog

v3.0.0...v3.1.0

Assets 3

05 Dec 21:49

github-actions

dev-wheels

727137a

dev-wheels

Automatic snapshot release of iree-turbine python wheels.

Assets 232

18 Nov 19:19

ScottTodd

v3.0.0

583004f

Release v3.0.0

Full changelog: v2.9.0...v3.0.0

Assets 3

09 Nov 00:40

ScottTodd

v2.9.0

4a3f8c9

Release v2.9.0

Release highlights

Bumped IREE to the latest stable release using new iree-base-compiler and iree-base-runtime package names and the matching version 2.9.0.

Full Changelog: v2.5.0...v2.9.0

Assets 2

06 Nov 21:29

ScottTodd

v2.5.0

9b3fa06

Release v2.5.0

Release highlights

Bumped IREE to the latest stable release: 20241104.1068. We expect the next release to use a new version scheme and new package names. See iree-org/iree#18938 for full details.
PyTorch versions 2.5+ are now supported thanks to 97e0517.
The migration from the shark-turbine package namespace to iree.turbine is structurally complete: 40016ad . Downstream projects should switch to the new namespace and iree-turbine package name.

Full Changelog: v2.3.0...v2.5.0

Assets 3

Releases: iree-org/iree-turbine

Release v3.6.0

IREE-Turbine Release Notes

IREE-Wave

New Operators & Kernels

Documentation

Kernel Improvements

Compiler Optimizations

General Improvements

Integration

IREE-Turbine Core

Change History

What's Changed

Contributors

Uh oh!

Release v3.5.0

Release notes

IREE-Turbine Core

IREE-Wave

New Operators & Kernels

Documentation

Kernel Improvements

Compiler Optimizations

General Improvements

New Contributors

Full changelog

Contributors

Uh oh!

Release v3.4.0

Release notes

New Kernels

New language features

Compiler optimizations

Other notable changes

New Contributors

Full changelog

Contributors

Uh oh!

Release v3.3.0

Release Notes

Kernels

Compiler & Runtime

QoL

New contributors

Full changelog

Contributors

Uh oh!

Release v3.2.0

Release highlights

iree-turbine core

TKW

Changelog

Uh oh!

Release v3.1.0

Release highlights

iree-turbine core

TKW

Full changelog

Uh oh!

dev-wheels

Uh oh!

Release v3.0.0

Uh oh!

Release v2.9.0

Release highlights

Uh oh!

Release v2.5.0

Release highlights

Uh oh!