[CK_TILE] Add irregular tail vectorloads pipeline #3606

jakpiase · 2026-01-19T15:27:25Z

Proposed changes

This PR adds a new pipeline that supports vectorloads > 1 for cases in which A/B last dimension is not a multiple of vectorload. Previously that kind of behavior would result in reaching out of bounds, but this pipeline introduces scalar reads for last loop iteration which fixes that problem.

Copilot

Pull request overview

This PR introduces a new GEMM pipeline (COMPUTE_V7) that enables vectorized loads greater than 1 when the last dimensions of matrices A and B are not multiples of the vector load size. The pipeline handles irregular tail cases by falling back to scalar reads in the final loop iteration, preventing out-of-bounds accesses.

Changes:

Added COMPUTE_V7 pipeline variant with support for irregular tail vectorloads
Implemented load_tile_with_elementwise_vectorload1 for scalar fallback reads
Modified kernel validation logic to skip vector size divisibility checks for COMPUTE_V7

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
include/ck_tile/ops/gemm/pipeline/gemm_pipelines.hpp	Added COMPUTE_V7 enum entry
include/ck_tile/ops/gemm/pipeline/gemm_pipeline_ag_bg_cr_comp_v7.hpp	New pipeline implementation with irregular tail support
include/ck_tile/ops/gemm/kernel/universal_gemm_kernel.hpp	Added validation for COMPUTE_V7 padding requirements and relaxed vector size checks
include/ck_tile/ops/gemm.hpp	Added include for new pipeline header
include/ck_tile/core/tensor/tile_window_base.hpp	Added TraitsVectorload1 structure for scalar load traits
include/ck_tile/core/tensor/tile_window.hpp	Implemented load_vectorload1 methods for scalar fallback
include/ck_tile/core/tensor/load_tile.hpp	Added load_tile_with_elementwise_vectorload1 function
example/ck_tile/03_gemm/gemm_utils.hpp	Added configuration and pipeline traits for COMPUTE_V7

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T22:12:16Z

include/ck_tile/core/tensor/load_tile.hpp

+                                       bool_constant<oob_conditional_check> = {})
+{
+    // TODO: Tile windows should works with unknow number of params
+    // Load element_wise API works only when the input typle is a tuple-type


Corrected spelling of 'typle' to 'tuple'.

Suggested change

// Load element_wise API works only when the input typle is a tuple-type

// Load element_wise API works only when the input tuple is a tuple-type

Copilot · 2026-01-20T22:12:17Z

include/ck_tile/core/tensor/load_tile.hpp

+                                       number<i_access>                     = {},
+                                       bool_constant<oob_conditional_check> = {})
+{
+    // TODO: Tile windows should works with unknow number of params


Corrected spelling of 'unknow' to 'unknown' and grammar 'works' to 'work'.

Copilot · 2026-01-20T22:12:17Z

include/ck_tile/ops/gemm/pipeline/gemm_pipeline_ag_bg_cr_comp_v7.hpp

+            auto first_a_load_tile_with_elementwise = [&]() {
+                if constexpr(HasHotLoop)
+                    return load_tile_with_elementwise(a_copy_dram_window, a_element_func);
+                else
+                    return load_tile_with_elementwise_vectorload1(a_copy_dram_window,
+                                                                  a_element_func);
+            }();
+
+            auto first_b_load_tile_with_elementwise = [&]() {
+                if constexpr(HasHotLoop)
+                    return load_tile_with_elementwise(b_copy_dram_window, b_element_func);
+                else
+                    return load_tile_with_elementwise_vectorload1(b_copy_dram_window,
+                                                                  b_element_func);
+            }();
+


The conditional logic determining which load function to use (load_tile_with_elementwise vs load_tile_with_elementwise_vectorload1) is duplicated for both A and B tiles (lines 444-450 and 452-458). Consider extracting this into a helper lambda or template function to reduce code duplication.

Suggested change

auto first_a_load_tile_with_elementwise = [&]() {

if constexpr(HasHotLoop)

return load_tile_with_elementwise(a_copy_dram_window, a_element_func);

else

return load_tile_with_elementwise_vectorload1(a_copy_dram_window,

a_element_func);

}();

auto first_b_load_tile_with_elementwise = [&]() {

if constexpr(HasHotLoop)

return load_tile_with_elementwise(b_copy_dram_window, b_element_func);

else

return load_tile_with_elementwise_vectorload1(b_copy_dram_window,

b_element_func);

}();

auto make_first_load_tile_with_elementwise =

[&](auto& copy_dram_window, auto& element_func) {

if constexpr(HasHotLoop)

return load_tile_with_elementwise(copy_dram_window, element_func);

else

return load_tile_with_elementwise_vectorload1(copy_dram_window,

element_func);

};

auto first_a_load_tile_with_elementwise =

make_first_load_tile_with_elementwise(a_copy_dram_window, a_element_func);

auto first_b_load_tile_with_elementwise =

make_first_load_tile_with_elementwise(b_copy_dram_window, b_element_func);

Copilot · 2026-01-20T22:12:18Z

include/ck_tile/ops/gemm/kernel/universal_gemm_kernel.hpp

                    AsTensorIsValid = false;
                }
-                if(kargs.K % vectorSizeA != 0)
+                if(kargs.K % vectorSizeA != 0 && GemmPipeline::GetPipelineName() != "COMPUTE_V7")


The pipeline name check GemmPipeline::GetPipelineName() != \"COMPUTE_V7\" is repeated in multiple validation conditions (lines 455, 487, 527, 560-561). Consider extracting this into a boolean constant at the beginning of the validation function to improve maintainability and reduce magic string comparisons.

add irregular tail vectorloads

6e7f79d

jakpiase requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 19, 2026 15:27

jakpiase changed the title ~~[CK_TILE Add irregular tail vectorloads pipeline~~ [CK_TILE] Add irregular tail vectorloads pipeline Jan 19, 2026

afagaj requested a review from Copilot January 20, 2026 22:11

Copilot AI reviewed Jan 20, 2026

View reviewed changes

illsilin assigned aosewski Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK_TILE] Add irregular tail vectorloads pipeline #3606

[CK_TILE] Add irregular tail vectorloads pipeline #3606

Uh oh!

jakpiase commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// Load element_wise API works only when the input typle is a tuple-type
	// Load element_wise API works only when the input tuple is a tuple-type

[CK_TILE] Add irregular tail vectorloads pipeline #3606

Are you sure you want to change the base?

[CK_TILE] Add irregular tail vectorloads pipeline #3606

Uh oh!

Conversation

jakpiase commented Jan 19, 2026

Proposed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants