[WIP] Fix overflow when shape >= 32Kx32Kx32K, buffer overflow by xintin · Pull Request #1061 · iree-org/wave

xintin · 2026-03-06T03:54:34Z

Buffer offset calculations used signed 32-bit limits (2^31 - 1), capping addressable memory at ~2 GB. This patch switches to unsigned 32-bit limits (2^32 - 1) to support up to ~4 GB, and drops the nsw (no-signed-wrap) overflow flag on offset arithmetic so the compiler doesn't misoptimize offsets above the signed range.
Updated buffer size constants for i8 type (2147483646 to 4294967294), OOB index dense constant (2147483647 to 4294967295), validBytes constant (2147483646 to 4294967294). Updated f16, f32, i32 buffer sizes.

ftynse

The size of the output memref looks completely off, before and after this patch. You may want to find the root cause of that rather than trying to increase the size for it to fit.

ftynse · 2026-03-06T10:39:20Z

tests/kernel/wave_gemm_mxfp_test.py

            "s_waitcnt vmcnt(0)",
            "s_waitcnt vmcnt(0) lgkmcnt(0)",
            "s_waitcnt vmcnt(0)",
+            "s_waitcnt lgkmcnt(14)",


This is... a lot.

that's what error message stated.
If the error message is wrong, then that should be corrected or guarded too.

What error message? This is a test looking for the presence of exact strings. It likely told you that a new string is present. But we need to understnad what that means, in particular we are adding a lot waits here, which will decrease performance.

ftynse · 2026-03-06T10:39:52Z

lit_tests/kernel/wave/codegen.py

+    # CHECK:            memref.reinterpret_cast %[[D1]] to offset: [0], sizes: [2147483646], strides: [1] : memref<f16> to memref<2147483646xf16, strided<[1]>>
+    # CHECK:            vector.store %[[V]], {{.*}}[{{.*}}] : memref<2147483646xf16, strided<[1]>>, vector<16xf16>


Fly-by: where does this number come from? This is an 8GB buffer, whereas it looks like we have M, N = 16, 16, meaning I'd expect to see 256 here.

see the diff, it is updated from 1073741822 (f32) to 2147483646 (f16)
that is ((2^32 - 1) // 2) - 1

Yes, but why do we use this number? Memref sizes are meaningful for MLIR optimization, you must not have a wrong size.

#1057) Signed-off-by: xintin <gaurav.verma@amd.com>

Signed-off-by: xintin <gaurav.verma@amd.com>

Until now, we have only been verifying the absence of a second non-unit step in index expressions of read and write operations. Do so for every operation in the trait that attaches the attribute. This is not super-efficient as it requires looking up the attribute on the same parent from all operations, but guarantees the check to happen unlike using the attribute verifier which will not kick in in absence of the hyperparameters attribute even if we can see a problem. A better, longer-term solution is to introduce a top-level wave kernel operation where hyperparameters are mandatory. We can also go for a normal form that will perform a top-down verification collecting the attributes on the way. Closes #1013. --------- Signed-off-by: Alex Zinenko <git@ozinenko.com> Signed-off-by: xintin <gaurav.verma@amd.com>

The schedule.py changes are now in xintin/fix_dynamic_pipeline_remainder_loop_start. Signed-off-by: xintin <gaurav.verma@amd.com> Made-with: Cursor Signed-off-by: xintin <gaurav.verma@amd.com>

…ainder_loop_start

This builds on PRs #1061, #1063, and #1067 to get the block size 256x224x256 working for the list of shapes we were looking at today. Signed-off-by: William G Hatch <william@hatch.uno>

With PRs #1061 this gets the block size 256x224x256 working for the list of shapes we were looking at today. Without #1061 it passes all shapes but one, which right now as I try to run I get an error that HIP doesn't have enough memory. I'll re-run it later when the machine hopefully has less usage. Signed-off-by: William G Hatch <william@hatch.uno>

xintin force-pushed the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch 2 times, most recently from affcf72 to 01bb2aa Compare March 6, 2026 03:58

xintin requested a review from harsh-nod March 6, 2026 03:59

xintin force-pushed the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch 2 times, most recently from 04b34b5 to b26d3b3 Compare March 6, 2026 05:19

xintin marked this pull request as draft March 6, 2026 05:29

xintin marked this pull request as ready for review March 6, 2026 05:36

xintin force-pushed the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch 2 times, most recently from 53d89ff to 7660835 Compare March 6, 2026 06:25

ftynse reviewed Mar 6, 2026

View reviewed changes

ftynse mentioned this pull request Mar 6, 2026

output memref sizes are bogus and excessively large #1064

Open

xintin changed the title ~~Fix overflow and dynamic pipeline remainder loop start~~ [WIP] Fix overflow and dynamic pipeline remainder loop start Mar 6, 2026

xintin changed the title ~~[WIP] Fix overflow and dynamic pipeline remainder loop start~~ [WIP] Fix overflow when shape >= 32Kx32Kx32K, buffer overflow Mar 6, 2026

xintin force-pushed the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch from c94f3cf to 9688fb2 Compare March 6, 2026 17:56

harsh-nod and others added 6 commits March 6, 2026 17:57

[compiler] Fix Rational codegen and pipeline unroll bug for dynamic s… (

ba99de6

#1057) Signed-off-by: xintin <gaurav.verma@amd.com>

fix more dyn shapes

e3f601f

Signed-off-by: xintin <gaurav.verma@amd.com>

fix lit tests: buffer size

503ed39

Signed-off-by: xintin <gaurav.verma@amd.com>

rebase

9d4629f

Signed-off-by: xintin <gaurav.verma@amd.com>

remove remainder loop start fix (moved to separate PR)

7a21b68

The schedule.py changes are now in xintin/fix_dynamic_pipeline_remainder_loop_start. Signed-off-by: xintin <gaurav.verma@amd.com> Made-with: Cursor Signed-off-by: xintin <gaurav.verma@amd.com>

xintin force-pushed the xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start branch from 7ed44c8 to 7a21b68 Compare March 6, 2026 17:58

Merge branch 'main' into xintin/fix_overflow_and_dynamic_pipeline_rem…

ac1e9f0

…ainder_loop_start

willghatch mentioned this pull request Mar 6, 2026

Fix bounds expressions to respect workgroup reordering #1072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix overflow when shape >= 32Kx32Kx32K, buffer overflow#1061

[WIP] Fix overflow when shape >= 32Kx32Kx32K, buffer overflow#1061
xintin wants to merge 7 commits intomainfrom
xintin/fix_overflow_and_dynamic_pipeline_remainder_loop_start

xintin commented Mar 6, 2026 •

edited

Loading

Uh oh!

ftynse left a comment

Uh oh!

ftynse Mar 6, 2026

Uh oh!

xintin Mar 6, 2026 •

edited

Loading

Uh oh!

ftynse Mar 6, 2026

Uh oh!

ftynse Mar 6, 2026

Uh oh!

xintin Mar 6, 2026

Uh oh!

ftynse Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# CHECK: memref.reinterpret_cast %[[D1]] to offset: [0], sizes: [2147483646], strides: [1] : memref<f16> to memref<2147483646xf16, strided<[1]>>
		# CHECK: vector.store %[[V]], {{.}}[{{.}}] : memref<2147483646xf16, strided<[1]>>, vector<16xf16>

Conversation

xintin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ftynse left a comment

Choose a reason for hiding this comment

Uh oh!

ftynse Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

xintin Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ftynse Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

ftynse Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

xintin Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

ftynse Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xintin commented Mar 6, 2026 •

edited

Loading

xintin Mar 6, 2026 •

edited

Loading