Releases: iree-org/iree
Releases · iree-org/iree
iree candidate iree-3.10.0rc20251128
Automatic candidate release of iree.
iree candidate iree-3.10.0rc20251127
Automatic candidate release of iree.
iree candidate iree-3.10.0rc20251126
Automatic candidate release of iree.
Release v3.9.0
IREE Release v3.9.0
1. Compiler
1.1 Data Tiling & GEMM Improvements
iree-opt-data-tilingpromoted to umbrella flag with suggested config. (#22295)- Default path switched to DispatchCreation phase; use
--iree-global-opt-data-tilingfor legacy behavior. See
docs. (#21441) - Implemented
subgroups_kin data-tiled MMA layouts. (#22519) - Added per-operand M/N/K interleaving control. (#22626)
- Added layout transfer support in MaterializeEncoding. (#22582)
- Strict
inner_tiledverifier withdistributed/opaqueparams. (#22369) - Unified encoding materialization passes. (#22472)
- Encoding op fusion with multi-use producers at
-O3. (#22444) - Intentional padding for non-K-major layouts (~2.7% GEMM improvement). (#22486)
- Better heuristics for extremely large GEMMs. (#22636)
- Refactored narrow matmul tile size selection. (#22177)
- Split reduction for large-K GEMMs. (#22357)
- Updated ukernel data layout. (#22350)
- Fixed large f16 ukernel bounds. (#22481)
- Added LLaMA 8B FP8 benchmark tests on gfx942. (#22387)
1.2 Dispatch Creation
- Added split-reduction support for arg_compare, preventing shared-memory overflow and fixing LLaMA 8B FP16 compilation failures. (#22466)
- Added aggressive multi-use fusion for encoding ops (enabled at
-O3), significantly improving fusion patterns seen in SDXL. (#22444) - Enabled consumer fusion for GPUApplyTilingLevel on scf.forall loops, enhancing padding-level fusion. (#22522)
1.3 GPU Codegen
- Added barrier insertion before first shared-memory write for AMD GPUs, fixing non-deterministic strided conv results (13% -> 0% failure rate). (#22669)
- Rewrote loop prefetcher with a stage-based backward slicing model for better maintainability (no functional change). (#22605)
- Implemented vector size inference for
UKernelGenericOp, enabling downstream ops (e.g., unpack) to correctly vectorize instead of falling back to scalar code. (#22440) - Improved f16 medium ukernel bounds on ROCm for better matmul throughput. (#22393)
- Added mmt4d ukernel support for RISC-V zvfh/zvfhmin, enabling f16xf16->f16/f32 kernels with runtime hardware probing. (#22231)
- Generalized GPU lowering for linalg.reduce ops, converting illegal i1 reductions to generic form to unblock split-reduction pipelines. (#22490)
1.4 Others
- Interfaces, Layouts & IR Improvements (#22467, #22390, #22368)
- Various correctness and quality improvements across codegen, layout propagation, and GPU lowering. (#22636, #22490, #22466, #22669, #22522, #22605, #22486, #22519, #22444, #22393, #22231, #22467, #22390, #22368, #22440, #22598)
- Exposed C and Python bindings for IGEMM convolution details (#22598)
2. Runtime
- Implemented the first end-to-end support for external transients, enabling early—but functional—handling of control flow and cross-dispatch transient values.
- Current limitations: no function calls and no data-dependent values; simple control flow is supported and aligns with future dispatch specialization work. (#22625)
- Added timeline-aware async execution across module boundaries, introducing foundational interfaces for precise cross-module scheduling. (#22381)
- Improved support for
iree_codegen.extract_strided_metadata, ensuring information-preserving lowering:- Now normalizes into
iree_codegenearlier, avoiding loss of stride/offset/alignment information that occurred when prematurely converting tomemref. (#22606)
- Now normalizes into
- Added new Stream canonicalizations and improved
RefineUsageto reduce unnecessary copies and fix correctness bugs. (#22610) - Added
--gen-dialect-jsontoiree-tblgen, generating JSON databases of dialect definitions using tablegen metadata. (#22603)
Change Log
Git History
What's Changed
- [LinalgExt] Don't vectorize map_scatter in non-contiguous sub-byte access by @jtuyls in #22242
- [python] Set up binding for preprocessing transform ops by @bangtianliu in #22227
- Re-enable lds_barrier on RDNA4 by @krzysz00 in #21922
- [CI][iree-test-suites] Try to make torch_models benchmarks more stable by @Groverkss in #22271
- Reapply "[GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline" (#22205)" by @nirvedhmeshram in #22223
- Reapply "[Dispatch Creation] Rework dispatch formation logic (#21854)" by @IanWood1 in #22065
- [debugging][gpu] Add --iree-hip-emit-debug-info flag by @willghatch in #22216
- [Codegen] Update the td spec using the contraction matcher op by @bangtianliu in #22249
- [Codegen] Update the td spec using the attention matcher op by @bangtianliu in #22266
- Revert "Re-enable lds_barrier on RDNA4" by @kuhar in #22278
- Integrate llvm/llvm-project@b92483c by @newling in #22274
- Support skinny scaled matmul in kernel config by @jtuyls in #22042
- Use llvm wrappers for accumulate. NFC. by @kuhar in #22279
- [NFC][GPU] Move reduction configuration to gpu utilities by @Groverkss in #22286
- [GPU] Move convolution check out of unrelated function by @Groverkss in #22287
- [GPU] Support iree_tensor_ext.dispatch.tensor.store for broadcast producer by @nirvedhmeshram in #22291
- [Docs] Read from first line of
rocm_agent_enumeratoroutput by @sjain-stanford in #22283 - [Codegen] Adding an optional
dma_sizesfield in GPU attributes by @lialan in #22281 - Bump LLVM to llvm/llvm-project@5a636c6 by @MaheshRavishankar in #22290
- Let MLIR ukernels provide their matching and data-tiled-layout info. by @bjacob in #22254
- [LLVMCPU] Propagate target features and CPU name to individual LLVMFuncOp by @mshockwave in #22036
- [CI][TorchModels] Update flags used for LLaMa 8b f8/fp16. by @MaheshRavishankar in #22297
- Promote iree-opt-data-tiling to pipeline options. by @hanhanW in #22295
- Bump version to 3.9.0 after 3.8.0 release. by @sa-faizal in #22308
- [GPU] Enabling Gather-like ops to go through GPUTileAndFuse pipeline by @Abhishek-Varma in #22251
- [python] Set up python binding for matcher convolution and attention op by @bangtianliu in #22311
- [DT][NFC] Trim IRs in encoding materialization tests for GPU and RISCV backends. by @hanhanW in #22313
- [GPU] Update K Tile size picking for multiple K dims by @Muzammiluddin-Syed-ECE in #22310
- [codegen][gpu] Make transfer_write conditional when not fully distributed by @newling in #22198
- [Stream] Replicate globals per affinity before Stream conversion. by @hanhanW in #22117
- Fix non-deterministic hoisting by @IanWood1 in #22319
- Drop revert of llvm/llvm-project#159083 by @MaheshRavishankar in #22298
- [Codegen] Allow pre-padding other dims of a conv except the input channel by @yzhang93 in #22296
- [CI][Torch] Update dispatch counts after non-determinism fix by @Groverkss in #22333
- [Codegen] Use llvm accumulate wrappers. NFC. by @kuhar in #22331
- [Codegen] Tile memref.copy when vectorizing for dynamic dims by @jtuyls in #22168
- Reapply "Re-enable lds_barrier on RDNA4" (#22278) by @krzysz00 in #22326
- [Codegen] Handle multiple dyn dims in tensor load pattern by @IanWood1 in #22328
- [DT][NFC] Add test files for materializing IREE ops with encodings. by @hanhanW in #22322
- [DT][NFC] Trim IRs for materialize_encoding_aarch64.mlir test. by @hanhanW in #22327
- [DT][NFC] Trim unnecessary IRs for materialize_encoding_vmvx.mlir test. by @hanhanW in #22330
- [DT][NFC] Trim unnecessary IRs for materialize_encoding_x86_64.mlir test. by @hanhanW in #22332
- [DispatchCreation] Add split reduction for weight backward convs by @yzhang93 in #22275
- [I...
iree candidate iree-3.9.0rc20251125
Automatic candidate release of iree.
iree candidate iree-3.9.0rc20251124
Automatic candidate release of iree.
iree candidate iree-3.9.0rc20251123
Automatic candidate release of iree.
iree candidate iree-3.9.0rc20251122
Automatic candidate release of iree.
iree candidate iree-3.9.0rc20251121
Automatic candidate release of iree.
iree candidate iree-3.9.0rc20251120
Automatic candidate release of iree.