[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

pashu123 · 2024-07-10T12:30:44Z

No description provided.

tests/e2e/matmul/generate_e2e_matmul_tests.py

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

kuhar · 2024-07-11T22:44:51Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+    // not supoorted by amgpu.mfma op.
+    Value zeroIdx = builder.create<arith::ConstantIndexOp>(loc, 0);
+    lhs = builder.create<vector::ExtractElementOp>(loc, lhs, zeroIdx);
+    rhs = builder.create<vector::ExtractElementOp>(loc, rhs, zeroIdx);


Use vector.extract with an attribute index instead

Updated! Any reasons why?

extract element allows for dynamism over the index and, in this application, requires you to materialize the index attribute as a constant. we don't need either of the two

kuhar · 2024-07-12T16:38:01Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+    lhs = builder.create<vector::ExtractOp>(loc, lhs, SmallVector<int64_t>{0});
+    rhs = builder.create<vector::ExtractOp>(loc, rhs, SmallVector<int64_t>{0});


I think ArrayRef{int64_t{0}} will also work

kuhar · 2024-07-12T16:58:19Z

@pashu123 DCO is failing, you need to sign all your commits and force-push

ScottTodd · 2024-07-12T20:44:02Z

I think this broke SDXL compilation with ~~default flags~~ benchmark flags for rocm: https://github.com/iree-org/iree/actions/runs/9913663701/job/27391531909#step:16:47. Presubmit showed that too: https://github.com/iree-org/iree/actions/runs/9911235296/job/27383808262#step:16:48

Sorry, CI results have been very noisy, particularly this morning. I also filed nod-ai/SHARK-TestSuite#286 for the poor failure mode in that workflow a few days ago (was hoping that would be noticed here during review and resolved, but I should have commented too).

ScottTodd · 2024-07-12T20:48:19Z

BTW please add PR descriptions in the future. I'm not sure from the logs and code if this was expected to change lowering paths and affect existing models, for example.

This reverts commit d65c6d4.

ScottTodd · 2024-07-12T21:24:56Z

Oooh, compilation succeeded but the iree-benchmark-module command itself failed? We only include stderr output right now, but there may have been messages on stdout.

Reverts #17847 This broke SDXL rocm pipeline tests on mi300, see #17847 (comment). The tests aren't showing error messages (`root:benchmark_sdxl_rocm.py:31 Command failed with error: b''`) so I can't easily tell what the issue is, nod-ai/SHARK-TestSuite#286 is filed to improve the situation there.

Signed-off-by: Lubo Litchev <[email protected]>

…rg#17894) Reverts iree-org#17847 This broke SDXL rocm pipeline tests on mi300, see iree-org#17847 (comment). The tests aren't showing error messages (`root:benchmark_sdxl_rocm.py:31 Command failed with error: b''`) so I can't easily tell what the issue is, nod-ai/SHARK-TestSuite#286 is filed to improve the situation there. Signed-off-by: Lubo Litchev <[email protected]>

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from ddacba7 to 9412a5e Compare July 10, 2024 12:44

ScottTodd mentioned this pull request Jul 10, 2024

Missing error handling in benchmark_sdxl_rocm.py nod-ai/SHARK-TestSuite#286

Open

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 3 times, most recently from 8227633 to a34e030 Compare July 11, 2024 14:09

pashu123 changed the title ~~[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction~~ [LLVMGPU][ROCm] Add MFMA_F32_16x16x8_F32 instruction Jul 11, 2024

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 73f0863 to a2ddc34 Compare July 11, 2024 16:41

pashu123 changed the title ~~[LLVMGPU][ROCm] Add MFMA_F32_16x16x8_F32 instruction~~ [LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction Jul 11, 2024

pashu123 force-pushed the wmma_ab_f32_c_f32 branch from a2ddc34 to 49e0bea Compare July 11, 2024 16:49

pashu123 marked this pull request as ready for review July 11, 2024 16:56

pashu123 requested review from kuhar, antiagainst and qedawkins as code owners July 11, 2024 16:56

kuhar reviewed Jul 11, 2024

View reviewed changes

tests/e2e/matmul/generate_e2e_matmul_tests.py Show resolved Hide resolved

pashu123 commented Jul 11, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp Show resolved Hide resolved

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 73bcdaa to 4ca5488 Compare July 11, 2024 20:39

kuhar reviewed Jul 11, 2024

View reviewed changes

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 3 times, most recently from f9953e0 to 435b878 Compare July 12, 2024 14:26

pashu123 requested a review from kuhar July 12, 2024 14:29

pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 8988ba7 to 24f1f84 Compare July 12, 2024 16:10

kuhar approved these changes Jul 12, 2024

View reviewed changes

kuhar reviewed Jul 12, 2024

View reviewed changes

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction

da1cd74

pashu123 force-pushed the wmma_ab_f32_c_f32 branch from 24f1f84 to da1cd74 Compare July 12, 2024 16:40

pashu123 enabled auto-merge (squash) July 12, 2024 17:09

pashu123 merged commit d65c6d4 into iree-org:main Jul 12, 2024
49 of 52 checks passed

ScottTodd added a commit that referenced this pull request Jul 12, 2024

Revert "[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction (#17847)"

7877652

This reverts commit d65c6d4.

ScottTodd mentioned this pull request Jul 12, 2024

Revert "[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction" #17894

Merged

LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction (iree-org#17847)

f322f9b

Signed-off-by: Lubo Litchev <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

pashu123 commented Jul 10, 2024

kuhar Jul 11, 2024

pashu123 Jul 12, 2024

kuhar Jul 12, 2024 •

edited

Loading

kuhar Jul 12, 2024

kuhar commented Jul 12, 2024

ScottTodd commented Jul 12, 2024 •

edited

Loading

ScottTodd commented Jul 12, 2024

ScottTodd commented Jul 12, 2024

		lhs = builder.create<vector::ExtractOp>(loc, lhs, SmallVector<int64_t>{0});
		rhs = builder.create<vector::ExtractOp>(loc, rhs, SmallVector<int64_t>{0});

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

Conversation

pashu123 commented Jul 10, 2024

kuhar Jul 11, 2024

Choose a reason for hiding this comment

pashu123 Jul 12, 2024

Choose a reason for hiding this comment

kuhar Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

kuhar Jul 12, 2024

Choose a reason for hiding this comment

kuhar commented Jul 12, 2024

ScottTodd commented Jul 12, 2024 • edited Loading

ScottTodd commented Jul 12, 2024

ScottTodd commented Jul 12, 2024

kuhar Jul 12, 2024 •

edited

Loading

ScottTodd commented Jul 12, 2024 •

edited

Loading