Skip to content

Conversation

@jtuyls
Copy link
Contributor

@jtuyls jtuyls commented Oct 27, 2025

To enable tensor-based ukernels by default (#22318), we will also need --iree-hip-specialize-dispatches to set by default to get the best overall performance. Trying that here to avoid/prefetch issues that have been coming up because this flag wasn't set in some test/locally. For example: #22421.

@jtuyls jtuyls requested a review from kuhar as a code owner October 27, 2025 17:23
@jtuyls jtuyls requested review from MaheshRavishankar, kuhar and qedawkins and removed request for kuhar October 27, 2025 17:23
sebvince added a commit that referenced this pull request Oct 29, 2025
…d e2e tests (#22446)

The primary purpose of this PR is to add missing e2e tests to cover all
combinations of ukernels. We have different ukernels for static and
dynamic shapes, so this PR introduces an option to
generate_e2e_matmul_tests.py that allows specifying the dynamicity of m,
n, and k.

Since ukernels require specialization to be enabled ([see PR
#22425](#22425)), this PR also adds
the following missing specializations :
- bf16 on gf942 
- f16, bf16, f8 on gfx950.

This will positively affect performance on dynamic shape matmuls without
compile-time bounds information. For example :

```
  !A_size = tensor<?x4096xbf16>
  !B_size = tensor<4096x4096xbf16>
  !C_size = tensor<?x4096xf32>
  
  func.func @Matmul(
  %A : !A_size, %B : !B_size) -> !C_size {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.000000e+00 : f32
  %m = tensor.dim %A, %c0 : tensor<?x4096xbf16>
  %empty = tensor.empty(%m) : !C_size
  %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size
  %0 = linalg.matmul 
     indexing_maps = [affine_map<(m, n, k) -> (m, k)>, 
                     affine_map<(m, n, k) -> (n, k)>,// transpose
                     affine_map<(m, n, k) -> (m, n)>]
                     ins(%A, %B : !A_size, !B_size)
                     outs(%C : !C_size) -> !C_size
  return %0 : !C_size
  }
  ```
Before PR: Time (ms): 30.831274
After PR: Time (ms): 0.284123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant