[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale #1829

big-yellow-duck · 2026-01-13T16:30:08Z

Motivation

This adds preliminary support for gfx1201 to use gemm_a8w8_blockscale from triton which is used in Qwen/Qwen3-0.6B-FP8

Moving forward, more triton kernels can be tuned to optimize the performance of gfx1201.

Technical Details

Added a base tuning script that is adaptable to other operations.
Added a tuning script to tune the triton kernel parameters for gemm_a8w8_blockscale.
the tuning script benchmarks different kernel parameter such as num_warps and waves_per_eu to find the optimal execution time for a set of operations.

Test Plan

test the tuned configs using aiter/op_tests/triton_tests/gemm/basic/test_gemm_a8w8_blockscale.py

pytest op_tests/triton_tests/gemm/basic/test_gemm_a8w8_blockscale.py

Test Result

126 tests have passed
2 skipped, (where N or K don't meet preshuffle kernel constraints: N must be multiple of 16, K must be multiple of 32)

Submission Checklist

[ ✅] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: NAME Amir Balwel [email protected]

Co-authoured-by: Amir Balwel [email protected]

…aiter into support_gfx1201_min

Co-authored-by: Amir Balwel <[email protected]>

…aiter into support_gfx1201_min

Co-authored-by: Jeff Aw <[email protected]> Signed-off-by: Amir Balwel <[email protected]>

Co-authored-by: Amir Balwel <[email protected]>

…aiter into support_gfx1201_min

…aiter into support_gfx1201_min Co-authored-by: Amir Balwel [email protected]

…il.com>" This reverts commit 879c2c5.

Signed-off-by: tjtanaa <[email protected]>

big-yellow-duck and others added 14 commits January 5, 2026 08:07

added tuned gemms for r9700

e0c5114

Merge branch 'ROCm:main' into main

e532f3a

Merge branch 'ROCm:main' into main

1a286e8

Added gemm_a8w8_blockscale support for gfx1201 with tuning script

bdab40d

Co-authored-by: NAME Amir Balwel [email protected]

Merge branch 'main' into support_gfx1201_min

c7664b8

added gfx1201 to types.py

c162331

Co-authoured-by: Amir Balwel [email protected]

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

fd925f4

…aiter into support_gfx1201_min

Merge branch 'ROCm:main' into support_gfx1201_min

2afe833

Merge branch 'ROCm:main' into support_gfx1201_min

a9f329a

Co-authored-by: Amir Balwel <[email protected]>

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

0ef32a3

…aiter into support_gfx1201_min

Add readme file and rename base to utils

897fd62

Co-authored-by: Jeff Aw <[email protected]> Signed-off-by: Amir Balwel <[email protected]>

add fp8 dtype

622fd33

Co-authored-by: Jeff Aw <[email protected]> Signed-off-by: Amir Balwel <[email protected]>

added gemm_a8w8_blocscale_shuffle

ab93b43

Co-authored-by: Amir Balwel <[email protected]>

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

5ff3029

…aiter into support_gfx1201_min

big-yellow-duck changed the title ~~Support gfx1201 min~~ Support gfx1201 for triton gemm_a8w8_blockscale Jan 16, 2026

big-yellow-duck and others added 10 commits January 16, 2026 10:30

Merge branch 'main' into support_gfx1201_min

7f09f13

update tuned gemm_a8w8_blockscale

aea5797

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

60ee427

…aiter into support_gfx1201_min Co-authored-by: Amir Balwel [email protected]

Merge branch 'main' into support_gfx1201_min

1d53884

Add readme for tuning Co-authored-by: Jeff Aw <[email protected]>

879c2c5

Add readme for tuning Co-authored-by: Jeff Aw <[email protected]>

c285687

update tuning readme

05f9ea7

Revert "Add readme for tuning Co-authored-by: Jeff Aw <jeffaw99@hotma…

971dcd8

…il.com>" This reverts commit 879c2c5.

Merge branch 'ROCm:main' into main

b645bcb

rebase and revert the submodule changes

47bba80

Signed-off-by: tjtanaa <[email protected]>

big-yellow-duck marked this pull request as ready for review January 23, 2026 02:50

big-yellow-duck requested a review from a team January 23, 2026 02:50

azaidy changed the title ~~Support gfx1201 for triton gemm_a8w8_blockscale~~ [TRITON] Support gfx1201 for triton gemm_a8w8_blockscale Jan 23, 2026

azaidy requested review from azaidy and vgokhale January 23, 2026 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale #1829

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale #1829

Uh oh!

big-yellow-duck commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale #1829

Are you sure you want to change the base?

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale #1829

Uh oh!

Conversation

big-yellow-duck commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

big-yellow-duck commented Jan 13, 2026 •

edited

Loading