[Triton] Remove mod N in ptr offsets for preshuffle gemms #1831

k50112113 · 2026-01-13T19:45:02Z

This PR removes the % (N // 16) in weight shuffled kernels as weight has to be padded to a multiple of 16 and 32 for FP8 blockscale and FP4 GEMMs, respectively, the mod in pointer offset is no longer required along N dim. This also applies to the shuffled weight scales in FP4 GEMM but not for FP8 blockscale GEMM because its weight scale is not shuffled.

This PR also includes:

some extra assertions in the helper function.
remove _gemm_afp4wfp4_kernel_preshuffle_scales as it has been deprecated for a long time

remove mod N in ptr offsets for preshuffle and clean up code

85c7a36

k50112113 requested review from a team and azaidy January 13, 2026 19:45

apply same changes for fused gemm kernels

71e74a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton] Remove mod N in ptr offsets for preshuffle gemms #1831

[Triton] Remove mod N in ptr offsets for preshuffle gemms #1831

Uh oh!

k50112113 commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Triton] Remove mod N in ptr offsets for preshuffle gemms #1831

Are you sure you want to change the base?

[Triton] Remove mod N in ptr offsets for preshuffle gemms #1831

Uh oh!

Conversation

k50112113 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

k50112113 commented Jan 13, 2026 •

edited

Loading