Skip to content

Release v3.5.0

Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 11 Jun 15:33
· 122 commits to main since this release
v3.5.0
b99b395

Release notes

IREE-Turbine Core

  • Enabled stream sharing between IREE and iree-turbine #883: Leveraged IREE's external HIP stream functionality to reduce launch latency when interoperating between PyTorch and IREE, improving performance for mixed-framework workflows.
  • Introduced initial async execution support in the runtime #719: Added support for asynchronous task execution in the IREE runtime, enabling better parallelism and future integration with asynchronous frontends and runtimes.

IREE-Wave

New Operators & Kernels

  • Added soft-sign kernel to replace tanh approximation kernel #829
  • Added GEMM example without transpose B #881

Documentation

  • Added documentation on several optimization passes #833, #835
  • Added Jupyter notebook with GEMM example #784
  • Added tutorial to implement fused softmax using wave #891

Kernel Improvements

  • Various bug fixes and performance improvements for decode attention and speculative decoding kernels #838, #839, #854, #858, #878, #890, #888

Compiler Optimizations

  • Improved GEMM performance with ping-pong optimizations #846, #832
  • Better shared memory allocation management #822, #831

General Improvements

  • Options to print compiler pass timings #813
  • Ability to print fx.graphs in mlir style #775

New Contributors

Full changelog

List of changes

  • First pass of async execution in the runtime. by @AWoloszyn in #719
  • [Wave] Add scaling to reference kernel in cache test by @raikonenfnu in #781
  • [Wave] Implement block wide reduction by @raikonenfnu in #765
  • [WAVE] Support wave-wise cumsum scan op::more than one elem per thread by @xintin in #753
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #782
  • Enable benchmarking in performance ci by @harsh-nod in #164
  • [TKW] Paged decode MHA using GenericDot by @Hardcode84 in #779
  • [TKW] Update paged decode API by @Hardcode84 in #786
  • [TKW] Silence expansion warning by @Hardcode84 in #788
  • [Wave] Fix broken lit tests by @harsh-nod in #789
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #785
  • [BOO] enable tuning specs for boo_convs by @zjgarvey in #783
  • Bump version to 3.5.0 after 3.4.0 release. by @ScottTodd in #793
  • [BOO] Add a timeout for out-of-process compilation by @zjgarvey in #796
  • [Wave] add excalidraw visualization of wave compilation pipeline by @martin-luecke in #777
  • [Wave] support for partial unrolling of iterate by @martin-luecke in #776
  • [BOO] Include tuning specs in pip package by @zjgarvey in #800
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #799
  • Bump actions/create-github-app-token from 2.0.2 to 2.0.6 in the github-actions group by @dependabot in #795
  • Add support for additional type interop by @rsuderman in #803
  • [TKW] Avoid cache folder race conditions when running tests in parallel by @Hardcode84 in #808
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #806
  • [TKW] Paged decode: move scaling into the kernel by @Hardcode84 in #807
  • [runtime] Enable async for CustomOp eager execution by @zjgarvey in #802
  • [boo_driver] Align -t option name with miopen driver by @rkayaith in #810
  • [Wave] Add jupyter notebook showing how to do a GEMM by @harsh-nod in #784
  • [TKW] Paged Decode dynamic symbols by @Hardcode84 in #809
  • [TKW] Tweak Paged Decode kernel waves/tile sizes by @Hardcode84 in #794
  • [Wave] Add mlir style printing by @harsh-nod in #775
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #812
  • [BOO] Fix a bug in boo_driver.py resulting from arg rename by @zjgarvey in #818
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #814
  • [boo] Add more tuning configurations by @rkayaith in #819
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #823
  • [BOO] remove generate.py script by @zjgarvey in #824
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #825
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #826
  • [Wave] Minimize shared allocs by @harsh-nod in #822
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #830
  • [BOO] Enable bad kernel launch fix + kwarg filtered replacement by @zjgarvey in #828
  • [Wave] Add additional check when minimizing allocs by @harsh-nod in #831
  • [BOO] Add autocast functionality to boo_conv by @zjgarvey in #827
  • [Wave] Add scheduling/synchronization related ops by @raikonenfnu in #832
  • [Wave] Add documentation by @harsh-nod in #833
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #834
  • [Wave] Add install requirement for mermaid diagrams by @harsh-nod in #835
  • [TKW] BF16 support in paged decode kernel by @Hardcode84 in #836
  • [TKW] Option to print pass timings by @Hardcode84 in #813
  • [runtime] Rename dltensor capsule to avoid memory leak by @zjgarvey in #843
  • [Wave] Generalize Barriers to work across iterations by @raikonenfnu in #839
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #842
  • [TKW] Improve wave runtime by @Hardcode84 in #844
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #848
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #849
  • [Wave] Two PP cluster GEMM by @raikonenfnu in #846
  • [Wave] Add ability to compare against flash_attn_func by @harsh-nod in #798
  • [BOO] Enable torch.compile for models with boo convs by @zjgarvey in #850
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #852
  • [wave] Fix bug in Permute.transform_index by @fabianmcg in #838
  • Pin IREE version to 3.5.0rc20250516 by @Hardcode84 in #853
  • [TKW] Large shape and wave_runtime for paged decode test by @Hardcode84 in #855
  • [Wave] Make APLP faster by @harsh-nod in #840
  • [Wave] Fix failing jupyter test by @harsh-nod in #854
  • [Wave] Fix IREE reference failing with wrong signature by @raikonenfnu in #858
  • Use turbine.runtime for generate_iree_ref by @Hardcode84 in #861
  • [boo] More tuning configurations by @rkayaith in #864
  • [BOO] Add an env variable for toggling backward boo convolutions by @zjgarvey in #865
  • [Wave] Introduce softsign kernel to replace tanh_approx by @yichiche in #829
  • [Wave] Align variable names in speculative decode kernel by @xintin in #872
  • [Wave] Added sin op by @SourishW in #866
  • [Wave] Added atan2 operation by @badgerbroch in #867
  • [Wave] added cos op by @bodhisaha in #869
  • [TKW] Drop inplace flag by @Hardcode84 in #863
  • Move scaled_dot_product_attention_bhsd under iree.turbine by @aviator19941 in #870
  • [Wave] Implement C = A @ B GEMM example by @efric in #881
  • [BOO] add a top-level readme showcasing basic usage by @zjgarvey in #882
  • Update IREE deps by @Hardcode84 in #880
  • [TKW] Add patterns to sink vector.extract/splat ops by @Hardcode84 in #851
  • Bump IREE requirement pins to their latest versions. by @iree-pr-automator in #886
  • Revert "[runtime] Rename dltensor capsule to avoid memory leak" by @Groverkss in #887
  • [WAVE] Correcting predicts dim in the first speculative decode kernel by @xintin in #878
  • [Wave] Speculative decode - default predict token to last token in vocab by @nithinsubbiah in #890
  • Allow stream sharing between iree and iree-turbine. by @AWoloszyn in #883
  • Update IREE version, remove dlpack hack and add tensor leak test by @Hardcode84 in #893
  • [Wave] Hide APLP warning by @Hardcode84 in #898
  • [BOO] Add Conv1d and Conv3d replacement capability by @zjgarvey in #885
  • [BOO] don't permute None in pytorch convolution backward by @zjgarvey in #884
  • [WAVE] Added Fused Softmax and Vector Addition Kernels/Tests Closes #874 by @bodhisaha in #891
  • [WAVE] Update ci tk yaml to have updated rocm installation by @xintin in #902
  • [WAVE] Update speculative decode first kernel's dims from 3D to 1D by @xintin in #888
  • Add a python3.11 mi300 unit tests and type checking job by @zjgarvey in #903
  • Reduce CPU overhead in lookup_device_from_torch by @zjgarvey in #905

Commit history: v3.4.0...v3.5.0