Skip to content

Conversation

@xiaobochen-amd
Copy link

Summary

Add ROCm MI350 (gfx950) support for MXFP8 quantization kernel.

Changes

  • Implement mxfp8_quantize for ROCm in mxfp8_extension.cpp and mxfp8_rocm.hip
  • Support colwise quantization with column-major output layout (matching CUDA API)
  • Support both FLOOR and RCEIL scaling modes
  • Add MI350 to test conditions in test_kernels.py

Testing

  • Validated against CUDA reference implementation on MI350
  • All test_cuda_mx_dim1_numerics tests pass for FLOOR and RCEIL modes
docker:  rocm/primus:v25.10

torch==2.11.0.dev20251221+rocm7.1

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3544

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 2636ce6 with merge base 57432bd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant