Skip to content

Conversation

Copy link

Copilot AI commented Jan 15, 2026

Responded to reviewer question about page attention unit test coverage for the new paged_attention_common API.

Analysis Provided

  • Confirmed op_tests/test_pa.py includes comprehensive tests for the new API via run_aiter_common()
  • Tests validate correctness against golden outputs for both unquantized and quantized KV cache paths
  • Tests cover per-tensor and per-token quantization with proper scale tensor handling for both HIP and ASM backends
  • Noted tests require ROCm/AMD GPU hardware (MI325/MI355) and run via CI pipeline

No Code Changes

This PR contains no code modifications—only clarification that existing test coverage validates the paged attention implementation introduced in the original commits.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Sergey Solo and others added 12 commits January 12, 2026 16:17
Inference engines should be calling paged_attention_common now with
shuffled kv cache layout and aiter internally will decide between asm
or hip kernel. HIP is more performant for lower concurrencies ( < 128).
Also a unit test has been updated to include the new interface.

Note that support for the shuffled scales in HIP is not supported and is
always redirected to asm now when KV cache is  in int8 or fp8 formats.
Copilot AI changed the title [WIP] Implement API for switching between ASM and HIP kernels Address PR review comments: confirm unit test coverage Jan 15, 2026
Copilot AI requested a review from fsx950223 January 15, 2026 04:17
Base automatically changed from common_hip_asm_pa_inerface to main January 16, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants