Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction #17847

Merged
merged 1 commit into from
Jul 12, 2024

Conversation

pashu123
Copy link
Contributor

No description provided.

@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from ddacba7 to 9412a5e Compare July 10, 2024 12:44
@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 3 times, most recently from 8227633 to a34e030 Compare July 11, 2024 14:09
@pashu123 pashu123 changed the title [LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction [LLVMGPU][ROCm] Add MFMA_F32_16x16x8_F32 instruction Jul 11, 2024
@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 73f0863 to a2ddc34 Compare July 11, 2024 16:41
@pashu123 pashu123 changed the title [LLVMGPU][ROCm] Add MFMA_F32_16x16x8_F32 instruction [LLVMGPU][ROCm] Add MFMA_F32_16x16x4_F32 instruction Jul 11, 2024
@pashu123 pashu123 marked this pull request as ready for review July 11, 2024 16:56
@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 73bcdaa to 4ca5488 Compare July 11, 2024 20:39
// not supoorted by amgpu.mfma op.
Value zeroIdx = builder.create<arith::ConstantIndexOp>(loc, 0);
lhs = builder.create<vector::ExtractElementOp>(loc, lhs, zeroIdx);
rhs = builder.create<vector::ExtractElementOp>(loc, rhs, zeroIdx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use vector.extract with an attribute index instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Any reasons why?

Copy link
Member

@kuhar kuhar Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract element allows for dynamism over the index and, in this application, requires you to materialize the index attribute as a constant. we don't need either of the two

@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 3 times, most recently from f9953e0 to 435b878 Compare July 12, 2024 14:26
@pashu123 pashu123 requested a review from kuhar July 12, 2024 14:29
@pashu123 pashu123 force-pushed the wmma_ab_f32_c_f32 branch 2 times, most recently from 8988ba7 to 24f1f84 Compare July 12, 2024 16:10
Comment on lines 615 to 616
lhs = builder.create<vector::ExtractOp>(loc, lhs, SmallVector<int64_t>{0});
rhs = builder.create<vector::ExtractOp>(loc, rhs, SmallVector<int64_t>{0});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ArrayRef{int64_t{0}} will also work

@kuhar
Copy link
Member

kuhar commented Jul 12, 2024

@pashu123 DCO is failing, you need to sign all your commits and force-push

@pashu123 pashu123 enabled auto-merge (squash) July 12, 2024 17:09
@pashu123 pashu123 merged commit d65c6d4 into iree-org:main Jul 12, 2024
49 of 52 checks passed
@ScottTodd
Copy link
Member

ScottTodd commented Jul 12, 2024

I think this broke SDXL compilation with default flags benchmark flags for rocm: https://github.com/iree-org/iree/actions/runs/9913663701/job/27391531909#step:16:47. Presubmit showed that too: https://github.com/iree-org/iree/actions/runs/9911235296/job/27383808262#step:16:48

Sorry, CI results have been very noisy, particularly this morning. I also filed nod-ai/SHARK-TestSuite#286 for the poor failure mode in that workflow a few days ago (was hoping that would be noticed here during review and resolved, but I should have commented too).

@ScottTodd
Copy link
Member

BTW please add PR descriptions in the future. I'm not sure from the logs and code if this was expected to change lowering paths and affect existing models, for example.

@ScottTodd
Copy link
Member

Oooh, compilation succeeded but the iree-benchmark-module command itself failed? We only include stderr output right now, but there may have been messages on stdout.

ScottTodd added a commit that referenced this pull request Jul 12, 2024
Reverts #17847

This broke SDXL rocm pipeline tests on mi300, see
#17847 (comment). The
tests aren't showing error messages (`root:benchmark_sdxl_rocm.py:31
Command failed with error: b''`) so I can't easily tell what the issue
is, nod-ai/SHARK-TestSuite#286 is filed to
improve the situation there.
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
…rg#17894)

Reverts iree-org#17847

This broke SDXL rocm pipeline tests on mi300, see
iree-org#17847 (comment). The
tests aren't showing error messages (`root:benchmark_sdxl_rocm.py:31
Command failed with error: b''`) so I can't easily tell what the issue
is, nod-ai/SHARK-TestSuite#286 is filed to
improve the situation there.

Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants