Skip to content

Conversation

@jakpiase
Copy link
Contributor

Proposed changes

This PR adds a new pipeline that supports vectorloads > 1 for cases in which A/B last dimension is not a multiple of vectorload. Previously that kind of behavior would result in reaching out of bounds, but this pipeline introduces scalar reads for last loop iteration which fixes that problem.

@jakpiase jakpiase changed the title [CK_TILE Add irregular tail vectorloads pipeline [CK_TILE] Add irregular tail vectorloads pipeline Jan 19, 2026
@afagaj afagaj requested a review from Copilot January 20, 2026 22:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new GEMM pipeline (COMPUTE_V7) that enables vectorized loads greater than 1 when the last dimensions of matrices A and B are not multiples of the vector load size. The pipeline handles irregular tail cases by falling back to scalar reads in the final loop iteration, preventing out-of-bounds accesses.

Changes:

  • Added COMPUTE_V7 pipeline variant with support for irregular tail vectorloads
  • Implemented load_tile_with_elementwise_vectorload1 for scalar fallback reads
  • Modified kernel validation logic to skip vector size divisibility checks for COMPUTE_V7

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
include/ck_tile/ops/gemm/pipeline/gemm_pipelines.hpp Added COMPUTE_V7 enum entry
include/ck_tile/ops/gemm/pipeline/gemm_pipeline_ag_bg_cr_comp_v7.hpp New pipeline implementation with irregular tail support
include/ck_tile/ops/gemm/kernel/universal_gemm_kernel.hpp Added validation for COMPUTE_V7 padding requirements and relaxed vector size checks
include/ck_tile/ops/gemm.hpp Added include for new pipeline header
include/ck_tile/core/tensor/tile_window_base.hpp Added TraitsVectorload1 structure for scalar load traits
include/ck_tile/core/tensor/tile_window.hpp Implemented load_vectorload1 methods for scalar fallback
include/ck_tile/core/tensor/load_tile.hpp Added load_tile_with_elementwise_vectorload1 function
example/ck_tile/03_gemm/gemm_utils.hpp Added configuration and pipeline traits for COMPUTE_V7

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bool_constant<oob_conditional_check> = {})
{
// TODO: Tile windows should works with unknow number of params
// Load element_wise API works only when the input typle is a tuple-type
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'typle' to 'tuple'.

Suggested change
// Load element_wise API works only when the input typle is a tuple-type
// Load element_wise API works only when the input tuple is a tuple-type

Copilot uses AI. Check for mistakes.
number<i_access> = {},
bool_constant<oob_conditional_check> = {})
{
// TODO: Tile windows should works with unknow number of params
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'unknow' to 'unknown' and grammar 'works' to 'work'.

Copilot uses AI. Check for mistakes.
Comment on lines +444 to +459
auto first_a_load_tile_with_elementwise = [&]() {
if constexpr(HasHotLoop)
return load_tile_with_elementwise(a_copy_dram_window, a_element_func);
else
return load_tile_with_elementwise_vectorload1(a_copy_dram_window,
a_element_func);
}();

auto first_b_load_tile_with_elementwise = [&]() {
if constexpr(HasHotLoop)
return load_tile_with_elementwise(b_copy_dram_window, b_element_func);
else
return load_tile_with_elementwise_vectorload1(b_copy_dram_window,
b_element_func);
}();

Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional logic determining which load function to use (load_tile_with_elementwise vs load_tile_with_elementwise_vectorload1) is duplicated for both A and B tiles (lines 444-450 and 452-458). Consider extracting this into a helper lambda or template function to reduce code duplication.

Suggested change
auto first_a_load_tile_with_elementwise = [&]() {
if constexpr(HasHotLoop)
return load_tile_with_elementwise(a_copy_dram_window, a_element_func);
else
return load_tile_with_elementwise_vectorload1(a_copy_dram_window,
a_element_func);
}();
auto first_b_load_tile_with_elementwise = [&]() {
if constexpr(HasHotLoop)
return load_tile_with_elementwise(b_copy_dram_window, b_element_func);
else
return load_tile_with_elementwise_vectorload1(b_copy_dram_window,
b_element_func);
}();
auto make_first_load_tile_with_elementwise =
[&](auto& copy_dram_window, auto& element_func) {
if constexpr(HasHotLoop)
return load_tile_with_elementwise(copy_dram_window, element_func);
else
return load_tile_with_elementwise_vectorload1(copy_dram_window,
element_func);
};
auto first_a_load_tile_with_elementwise =
make_first_load_tile_with_elementwise(a_copy_dram_window, a_element_func);
auto first_b_load_tile_with_elementwise =
make_first_load_tile_with_elementwise(b_copy_dram_window, b_element_func);

Copilot uses AI. Check for mistakes.
AsTensorIsValid = false;
}
if(kargs.K % vectorSizeA != 0)
if(kargs.K % vectorSizeA != 0 && GemmPipeline::GetPipelineName() != "COMPUTE_V7")
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline name check GemmPipeline::GetPipelineName() != \"COMPUTE_V7\" is repeated in multiple validation conditions (lines 455, 487, 527, 560-561). Consider extracting this into a boolean constant at the beginning of the validation function to improve maintainability and reduce magic string comparisons.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants