[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446

sebvince · 2025-10-28T15:45:07Z

The primary purpose of this PR is to add missing e2e tests to cover all combinations of ukernels. We have different ukernels for static and dynamic shapes, so this PR introduces an option to generate_e2e_matmul_tests.py that allows specifying the dynamicity of m, n, and k.

Since ukernels require specialization to be enabled (see PR #22425), this PR also adds the following missing specializations :

bf16 on gf942
f16, bf16, f8 on gfx950.

This will positively affect performance on dynamic shape matmuls without compile-time bounds information. For example :

  !A_size = tensor<?x4096xbf16>
  !B_size = tensor<4096x4096xbf16>
  !C_size = tensor<?x4096xf32>
  
  func.func @matmul(
  %A : !A_size, %B : !B_size) -> !C_size {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.000000e+00 : f32
  %m = tensor.dim %A, %c0 : tensor<?x4096xbf16>
  %empty = tensor.empty(%m) : !C_size
  %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size
  %0 = linalg.matmul 
     indexing_maps = [affine_map<(m, n, k) -> (m, k)>, 
                     affine_map<(m, n, k) -> (n, k)>,// transpose
                     affine_map<(m, n, k) -> (m, n)>]
                     ins(%A, %B : !A_size, !B_size)
                     outs(%C : !C_size) -> !C_size
  return %0 : !C_size
  }

Before PR: Time (ms): 30.831274
After PR: Time (ms): 0.284123

bjacob

There is a discrepancy between the PR title which suggests it's a test-only PR, and the PR contents that add new specialization patterns. I'm not competent to review those, so please get another reviewer for that.

tests/e2e/matmul/generate_e2e_matmul_tests.py

jtuyls

LGTM. The specialization patterns make sense to me as they are the same as the existing ones, just for different types. If you have any, it would be useful to list experimental result with and without these specialization patterns for the new gfx950 patterns in the PR description.

sebvince · 2025-10-29T14:08:48Z

LGTM. The specialization patterns make sense to me as they are the same as the existing ones, just for different types. If you have any, it would be useful to list experimental result with and without these specialization patterns for the new gfx950 patterns in the PR description.

After some testing, I found that some f16 & bf16 configurations of pingpong are slower that the default path (independent from this PR). This is due to the fact that current pingpong uses MFMA_F32_16x16x16_F16 instead of MFMA_F32_16x16x32_F16. To keep in mind if we want to enable ukernels by default.
I've created a separate task to fix this : #22457

#22446 regressed building those tests, which are gfx950-only. https://discord.com/channels/689900678990135345/689957613152239638/1433170968620175411 Signed-off-by: Benoit Jacob <[email protected]>

@Matmul

…d e2e tests (iree-org#22446) The primary purpose of this PR is to add missing e2e tests to cover all combinations of ukernels. We have different ukernels for static and dynamic shapes, so this PR introduces an option to generate_e2e_matmul_tests.py that allows specifying the dynamicity of m, n, and k. Since ukernels require specialization to be enabled ([see PR iree-org#22425](iree-org#22425)), this PR also adds the following missing specializations : - bf16 on gf942 - f16, bf16, f8 on gfx950. This will positively affect performance on dynamic shape matmuls without compile-time bounds information. For example : ``` !A_size = tensor<?x4096xbf16> !B_size = tensor<4096x4096xbf16> !C_size = tensor<?x4096xf32> func.func @Matmul( %A : !A_size, %B : !B_size) -> !C_size { %c0 = arith.constant 0 : index %cst = arith.constant 0.000000e+00 : f32 %m = tensor.dim %A, %c0 : tensor<?x4096xbf16> %empty = tensor.empty(%m) : !C_size %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size %0 = linalg.matmul indexing_maps = [affine_map<(m, n, k) -> (m, k)>, affine_map<(m, n, k) -> (n, k)>,// transpose affine_map<(m, n, k) -> (m, n)>] ins(%A, %B : !A_size, !B_size) outs(%C : !C_size) -> !C_size return %0 : !C_size } ``` Before PR: Time (ms): 30.831274 After PR: Time (ms): 0.284123

) iree-org#22446 regressed building those tests, which are gfx950-only. https://discord.com/channels/689900678990135345/689957613152239638/1433170968620175411 Signed-off-by: Benoit Jacob <[email protected]>

sebvince added 6 commits October 28, 2025 12:00

Add param to generate_e2e_matmul to define custom dynamicity

73a5df6

add bf16 specialization

71b1e43

Add more tests

966582d

Cleanup and Renaming

f0ffc06

Add ukernel specialization for gfx950

92dece9

Make is_dynamic optional

501a4a2

sebvince changed the title ~~[ukernels] Add e2e tests to cover all combinaison of ukernels~~ [ukernels] Add e2e tests to cover all combinations of ukernels Oct 28, 2025

sebvince added 2 commits October 28, 2025 16:39

Fix typo

1fa2286

Remove unecessary compile flags

cd288f7

sebvince marked this pull request as ready for review October 28, 2025 17:11

sebvince requested a review from kuhar as a code owner October 28, 2025 17:11

sebvince requested review from bjacob and jtuyls October 28, 2025 17:11

bjacob requested changes Oct 28, 2025

View reviewed changes

tests/e2e/matmul/generate_e2e_matmul_tests.py Outdated Show resolved Hide resolved

sebvince changed the title ~~[ukernels] Add e2e tests to cover all combinations of ukernels~~ [ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests Oct 29, 2025

Rename is_dynamic option to mnk_dynamicities

dc211a5

jtuyls approved these changes Oct 29, 2025

View reviewed changes

sebvince requested a review from bjacob October 29, 2025 14:09

bjacob approved these changes Oct 29, 2025

View reviewed changes

sebvince merged commit b572d6f into iree-org:main Oct 29, 2025
35 of 46 checks passed

bjacob mentioned this pull request Oct 29, 2025

Fix e2e matmul mxfp4 tests on gfx950 post #22446 #22464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446

[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446

Uh oh!

sebvince commented Oct 28, 2025 •

edited

Loading

Uh oh!

bjacob left a comment

Uh oh!

Uh oh!

jtuyls left a comment

Uh oh!

sebvince commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446

[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446

Uh oh!

Conversation

sebvince commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjacob left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jtuyls left a comment

Choose a reason for hiding this comment

Uh oh!

sebvince commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sebvince commented Oct 28, 2025 •

edited

Loading