Skip to content

Conversation

@ChuanLi1101
Copy link
Collaborator

Summary

Integrate AITER's new \ used_attn_output_rmsnorm\ kernel into GPT-OSS model.

Changes

  • Add \ATOM_ENABLE_FUSED_ATTN_OUTPUT_RMSNORM\ environment variable (default: disabled)
  • Update \TransformerBlock\ to use fused kernel when enabled
  • Supports \x_pad_to_multiple\ for MoE compatibility

Usage

\\�ash
export ATOM_ENABLE_FUSED_ATTN_OUTPUT_RMSNORM=1
\\

Dependencies

Requires: AITER PR ROCm/aiter#1863 merged first

Performance Benefits

  • Reduces kernel launch overhead (3 kernels -> 1)
  • Saves memory bandwidth
  • Expected ~5-8% E2E improvement for GPT-OSS prefill

Integrate AITER's new fused_attn_output_rmsnorm kernel into GPT-OSS model.

Changes:
- Add ATOM_ENABLE_FUSED_ATTN_OUTPUT_RMSNORM env variable (default: disabled)
- Update TransformerBlock to use fused kernel when enabled
- Supports x_pad_to_multiple for MoE compatibility

Usage:
  export ATOM_ENABLE_FUSED_ATTN_OUTPUT_RMSNORM=1

Requires: AITER with fused_attn_output_rmsnorm kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants