mla ps support page64 and add paged 3buffer layout for ds3.2 #1875

minmengdie · 2026-01-20T08:15:57Z

Motivation

mla ps add paged 3buffer layout for ds3.2

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This pull request adds support for a paged 3-buffer layout to the MLA (Multi-head Latent Attention) persistent kernel implementation for DeepSeek v3.2. The 3-buffer layout uses separate buffers for FP8-quantized nope data, FP32 scale factors, and BF16 rope data, all packed together as uint8 bytes.

Changes:

Adds a new 3BUFFER paged layout option with scale_dim parameter for managing quantized KV cache data
Updates copyright years across multiple files to 2024-2026
Adds kernel selection logic for uint8 dtype handling in the CUDA backend

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 10 comments.

File	Description
op_tests/test_mla_persistent.py	Adds test infrastructure for 3-buffer layout including data packing/unpacking logic, new test function, and command-line arguments
hsa/gfx942/mla/mla.co	New compiled kernel binary for 3-buffer layout support
csrc/py_itfs_cu/asm_mla.cu	Updates kernel selection logic to handle uint8 dtype and adds new kernel path for 3-buffer layout
aiter/mla.py	Updates shape extraction logic to handle uint8 dtype differently from standard dtypes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T08:22:15Z

op_tests/test_mla_persistent.py

+        err_ref_fp8 = checkAllclose(
+            out_ref,
+            out_ref_fp8,
+            msg="mla_decode-absorb_fp8    [golden fp8 vs golden]:......",
+        )


The variable name 'err_ref_fp8' is assigned but never used. Consider removing it or using it in the test assertions.

Copilot · 2026-01-20T08:22:16Z

op_tests/test_mla_persistent.py

    type=int,
-    default=128,
+    default=512,
    help="""qk nope head dim.


The default value for qk_nope_head_dim changed from 128 to 512, but the help text still shows the old default value (128) in the example. Update the help text to reflect the new default or explain why this change was made.

Suggested change

help="""qk nope head dim.

help="""qk nope head dim.

Default is 512 (updated from 128 to match the current MLA kernel configuration).

Copilot · 2026-01-20T08:22:16Z

csrc/py_itfs_cu/asm_mla.cu

+        {
+            if(persistent)
+            {
+                // todo:add support condition


The TODO comment "todo:add support condition" is vague and does not specify what conditions need to be added or when this should be addressed. Either implement the missing conditions or provide a more specific TODO with details about what needs to be done.

Suggested change

// todo:add support condition

// TODO: define and enforce the exact support conditions for the persistent Byte/Char KV path.

// This placeholder kernel selection should only be used once:

// - the correct assembly kernel symbol and .co file for Byte/Char KV are available, and

// - guards on shapes and configuration (e.g., gqa_ratio, sequence lengths, scaling tensors)

// are added here to match the constraints of that kernel.

csrc/py_itfs_cu/asm_mla.cu

op_tests/test_mla_persistent.py

Copilot · 2026-01-20T08:22:17Z

op_tests/test_mla_persistent.py

-    default=1,
+    default=64,
    help="""Block size.
    e.g.: -blk 1""",


The default value for block_size changed from 1 to 64, but the help text still shows the old default value (1) in the example. Update the help text to reflect the new default or explain why this change was made.

Suggested change

e.g.: -blk 1""",

e.g.: -blk 64""",

op_tests/test_mla_persistent.py

Zzz9990 · 2026-01-21T11:26:19Z

op_tests/test_mla_persistent.py

    --varlen # True""",
 )
-
+parser.add_argument(


add description for "LEGACY" & "3BUFFER“

ruanjm · 2026-01-22T03:10:14Z

op_tests/test_mla_persistent.py

        reduce_partial_map,
-        kv_granularity=max(page_size, 16),
+        page_size=page_size,
+        kv_granularity=2,


Is 2 is a optimized setting for the case page_size=64 ? I thought kv_granularity should be set as max(1, 16/page_size). However, this formula doesn't match your setting here. If it doesn't work, I think we need a function to get the granularity which takes max(1, 16/page_size) as default.

ruanjm · 2026-01-22T03:16:17Z

csrc/kernels/mla/metadata/v1_2_device.cuh


    // expected payload handled by each cu part.
    const int32_t payload = ck_tile::integer_divide_ceil(sum_blocks, params.num_splits) +
                            Traits::kFixedOverheadNumBlocks;


I think we need to change the fixed coefficient kFixedOverheadNumBlocks based on page size. It is not a divisor so we may can make it be a runtime parameter.

It is good to me to do this in further PR.

mla ps add paged 3buffer layout for ds3.2

207cb69

minmengdie requested review from a team and Copilot January 20, 2026 08:15

Copilot started reviewing on behalf of minmengdie January 20, 2026 08:17 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

minmengdie force-pushed the mmd/dev/mla_ps_3buffer branch 2 times, most recently from 236523a to 51d8feb Compare January 21, 2026 09:04

support page_size=64 in metadata kernel

b568d85

minmengdie force-pushed the mmd/dev/mla_ps_3buffer branch from 51d8feb to b568d85 Compare January 21, 2026 09:05

modify the test for page size = 64

b78b537

minmengdie force-pushed the mmd/dev/mla_ps_3buffer branch from ba51370 to b78b537 Compare January 21, 2026 11:12

Zzz9990 reviewed Jan 21, 2026

View reviewed changes

op_tests/test_mla_persistent.py

--varlen # True""",

)

parser.add_argument(

Copy link

Contributor

Zzz9990 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add description for "LEGACY" & "3BUFFER“

ruanjm reviewed Jan 22, 2026

View reviewed changes

test page64

1bf92d8

minmengdie changed the title ~~mla ps add paged 3buffer layout for ds3.2~~ mla ps support page64 and add paged 3buffer layout for ds3.2 Jan 23, 2026

fix test tail block error and 3buffer layout to page_size * nhead * dim

ddb11de

minmengdie force-pushed the mmd/dev/mla_ps_3buffer branch from 3c51e2c to ddb11de Compare January 23, 2026 08:17

fix black error

0e699d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mla ps support page64 and add paged 3buffer layout for ds3.2 #1875

mla ps support page64 and add paged 3buffer layout for ds3.2 #1875

Uh oh!

minmengdie commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zzz9990 Jan 21, 2026

Uh oh!

ruanjm Jan 22, 2026

Uh oh!

ruanjm Jan 22, 2026

Uh oh!

ruanjm Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	help="""qk nope head dim.
	help="""qk nope head dim.
	Default is 512 (updated from 128 to match the current MLA kernel configuration).

-                // todo:add support condition
+                // TODO: define and enforce the exact support conditions for the persistent Byte/Char KV path.
+                // This placeholder kernel selection should only be used once:
+                //  - the correct assembly kernel symbol and .co file for Byte/Char KV are available, and
+                //  - guards on shapes and configuration (e.g., gqa_ratio, sequence lengths, scaling tensors)
+                //    are added here to match the constraints of that kernel.

mla ps support page64 and add paged 3buffer layout for ds3.2 #1875

Are you sure you want to change the base?

mla ps support page64 and add paged 3buffer layout for ds3.2 #1875

Uh oh!

Conversation

minmengdie commented Jan 20, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zzz9990 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ruanjm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ruanjm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ruanjm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants