Skip to content

Conversation

@sebvince
Copy link
Contributor

@sebvince sebvince commented Oct 28, 2025

The primary purpose of this PR is to add missing e2e tests to cover all combinations of ukernels. We have different ukernels for static and dynamic shapes, so this PR introduces an option to generate_e2e_matmul_tests.py that allows specifying the dynamicity of m, n, and k.

Since ukernels require specialization to be enabled (see PR #22425), this PR also adds the following missing specializations :

  • bf16 on gf942
  • f16, bf16, f8 on gfx950.

This will positively affect performance on dynamic shape matmuls without compile-time bounds information. For example :

  !A_size = tensor<?x4096xbf16>
  !B_size = tensor<4096x4096xbf16>
  !C_size = tensor<?x4096xf32>
  
  func.func @matmul(
  %A : !A_size, %B : !B_size) -> !C_size {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.000000e+00 : f32
  %m = tensor.dim %A, %c0 : tensor<?x4096xbf16>
  %empty = tensor.empty(%m) : !C_size
  %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size
  %0 = linalg.matmul 
     indexing_maps = [affine_map<(m, n, k) -> (m, k)>, 
                     affine_map<(m, n, k) -> (n, k)>,// transpose
                     affine_map<(m, n, k) -> (m, n)>]
                     ins(%A, %B : !A_size, !B_size)
                     outs(%C : !C_size) -> !C_size
  return %0 : !C_size
  }

Before PR: Time (ms): 30.831274
After PR: Time (ms): 0.284123

@sebvince sebvince changed the title [ukernels] Add e2e tests to cover all combinaison of ukernels [ukernels] Add e2e tests to cover all combinations of ukernels Oct 28, 2025
@sebvince sebvince marked this pull request as ready for review October 28, 2025 17:11
@sebvince sebvince requested a review from kuhar as a code owner October 28, 2025 17:11
@sebvince sebvince requested review from bjacob and jtuyls October 28, 2025 17:11
Copy link
Collaborator

@bjacob bjacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a discrepancy between the PR title which suggests it's a test-only PR, and the PR contents that add new specialization patterns. I'm not competent to review those, so please get another reviewer for that.

@sebvince sebvince changed the title [ukernels] Add e2e tests to cover all combinations of ukernels [ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests Oct 29, 2025
Copy link
Contributor

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The specialization patterns make sense to me as they are the same as the existing ones, just for different types. If you have any, it would be useful to list experimental result with and without these specialization patterns for the new gfx950 patterns in the PR description.

@sebvince
Copy link
Contributor Author

LGTM. The specialization patterns make sense to me as they are the same as the existing ones, just for different types. If you have any, it would be useful to list experimental result with and without these specialization patterns for the new gfx950 patterns in the PR description.

After some testing, I found that some f16 & bf16 configurations of pingpong are slower that the default path (independent from this PR). This is due to the fact that current pingpong uses MFMA_F32_16x16x16_F16 instead of MFMA_F32_16x16x32_F16. To keep in mind if we want to enable ukernels by default.
I've created a separate task to fix this : #22457

@sebvince sebvince requested a review from bjacob October 29, 2025 14:09
@sebvince sebvince merged commit b572d6f into iree-org:main Oct 29, 2025
35 of 46 checks passed
MaheshRavishankar pushed a commit that referenced this pull request Oct 29, 2025
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Oct 30, 2025
…d e2e tests (iree-org#22446)

The primary purpose of this PR is to add missing e2e tests to cover all
combinations of ukernels. We have different ukernels for static and
dynamic shapes, so this PR introduces an option to
generate_e2e_matmul_tests.py that allows specifying the dynamicity of m,
n, and k.

Since ukernels require specialization to be enabled ([see PR
iree-org#22425](iree-org#22425)), this PR also adds
the following missing specializations :
- bf16 on gf942 
- f16, bf16, f8 on gfx950.

This will positively affect performance on dynamic shape matmuls without
compile-time bounds information. For example :

```
  !A_size = tensor<?x4096xbf16>
  !B_size = tensor<4096x4096xbf16>
  !C_size = tensor<?x4096xf32>
  
  func.func @Matmul(
  %A : !A_size, %B : !B_size) -> !C_size {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.000000e+00 : f32
  %m = tensor.dim %A, %c0 : tensor<?x4096xbf16>
  %empty = tensor.empty(%m) : !C_size
  %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size
  %0 = linalg.matmul 
     indexing_maps = [affine_map<(m, n, k) -> (m, k)>, 
                     affine_map<(m, n, k) -> (n, k)>,// transpose
                     affine_map<(m, n, k) -> (m, n)>]
                     ins(%A, %B : !A_size, !B_size)
                     outs(%C : !C_size) -> !C_size
  return %0 : !C_size
  }
  ```
Before PR: Time (ms): 30.831274
After PR: Time (ms): 0.284123
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants