- 
                Notifications
    You must be signed in to change notification settings 
- Fork 780
[ukernels] Add missing specializations on gfx942/gfx950 and associated e2e tests #22446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a discrepancy between the PR title which suggests it's a test-only PR, and the PR contents that add new specialization patterns. I'm not competent to review those, so please get another reviewer for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The specialization patterns make sense to me as they are the same as the existing ones, just for different types. If you have any, it would be useful to list experimental result with and without these specialization patterns for the new gfx950 patterns in the PR description.
| 
 After some testing, I found that some f16 & bf16 configurations of pingpong are slower that the default path (independent from this PR). This is due to the fact that current pingpong uses MFMA_F32_16x16x16_F16 instead of MFMA_F32_16x16x32_F16. To keep in mind if we want to enable ukernels by default. | 
#22446 regressed building those tests, which are gfx950-only. https://discord.com/channels/689900678990135345/689957613152239638/1433170968620175411 Signed-off-by: Benoit Jacob <[email protected]>
…d e2e tests (iree-org#22446) The primary purpose of this PR is to add missing e2e tests to cover all combinations of ukernels. We have different ukernels for static and dynamic shapes, so this PR introduces an option to generate_e2e_matmul_tests.py that allows specifying the dynamicity of m, n, and k. Since ukernels require specialization to be enabled ([see PR iree-org#22425](iree-org#22425)), this PR also adds the following missing specializations : - bf16 on gf942 - f16, bf16, f8 on gfx950. This will positively affect performance on dynamic shape matmuls without compile-time bounds information. For example : ``` !A_size = tensor<?x4096xbf16> !B_size = tensor<4096x4096xbf16> !C_size = tensor<?x4096xf32> func.func @Matmul( %A : !A_size, %B : !B_size) -> !C_size { %c0 = arith.constant 0 : index %cst = arith.constant 0.000000e+00 : f32 %m = tensor.dim %A, %c0 : tensor<?x4096xbf16> %empty = tensor.empty(%m) : !C_size %C = linalg.fill ins(%cst : f32) outs(%empty : !C_size) -> !C_size %0 = linalg.matmul indexing_maps = [affine_map<(m, n, k) -> (m, k)>, affine_map<(m, n, k) -> (n, k)>,// transpose affine_map<(m, n, k) -> (m, n)>] ins(%A, %B : !A_size, !B_size) outs(%C : !C_size) -> !C_size return %0 : !C_size } ``` Before PR: Time (ms): 30.831274 After PR: Time (ms): 0.284123
) iree-org#22446 regressed building those tests, which are gfx950-only. https://discord.com/channels/689900678990135345/689957613152239638/1433170968620175411 Signed-off-by: Benoit Jacob <[email protected]>
The primary purpose of this PR is to add missing e2e tests to cover all combinations of ukernels. We have different ukernels for static and dynamic shapes, so this PR introduces an option to generate_e2e_matmul_tests.py that allows specifying the dynamicity of m, n, and k.
Since ukernels require specialization to be enabled (see PR #22425), this PR also adds the following missing specializations :
This will positively affect performance on dynamic shape matmuls without compile-time bounds information. For example :
Before PR: Time (ms): 30.831274
After PR: Time (ms): 0.284123