Skip to content

Conversation

@k50112113
Copy link
Contributor

@k50112113 k50112113 commented Jan 13, 2026

This PR removes the % (N // 16) in weight shuffled kernels as weight has to be padded to a multiple of 16 and 32 for FP8 blockscale and FP4 GEMMs, respectively, the mod in pointer offset is no longer required along N dim. This also applies to the shuffled weight scales in FP4 GEMM but not for FP8 blockscale GEMM because its weight scale is not shuffled.

This PR also includes:

  1. some extra assertions in the helper function.
  2. remove _gemm_afp4wfp4_kernel_preshuffle_scales as it has been deprecated for a long time

@k50112113 k50112113 requested review from a team and azaidy January 13, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants