Skip to content

Conversation

@tenpercent
Copy link
Contributor

Summary

  • Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
  • Replace lambdas with named functors in transform_tensor_descriptor
  • Add unpack_and_merge_sequences helper functor
  • Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

  • Replaces common lambda pattern for generating identity sequences
  • Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
  • Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

  • Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
  • Use generate_identity_sequences in matrix_padder.hpp

Test Plan

  • Added 7 unit tests:
    • 4 tests for generate_identity_sequences
    • 3 tests for unpack_and_merge_sequences
  • Waiting for full CI

Related PRs

This PR merges the functionality from:

Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)

Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.

This adds an optimized helper for the common generate_tuple pattern:
generate_tuple([](auto i) { return Sequence<i.value>{}; }, N)

The new generate_identity_sequences<N>() function creates
Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>> without
requiring lambda instantiation at each call site.

Updated 21 call sites across threadwise_tensor_slice_transfer,
wrapper utilities, and layout files to use the new helper.

Build time improvement: ~1.1% wall-clock (18.3s -> 18.1s)
Lambda expressions in transform_tensor_descriptor created unique template
instantiations for each capture combination. This change replaces lambdas
with named functor structs to reduce instantiation count:

- Add merge_sequences_functor and unpack_and_merge_sequences helper
- Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids
- Add generate_arithmetic_sequence_from_scan

Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).
Detailed comments explain:
- generate_identity_sequences: Replaces 21 lambda-based call sites
- merge_sequences_functor and unpack_and_merge_sequences: Named functors vs lambdas
- convert_visible_to_hidden_id and related functors: Eliminate nested lambda instantiations
- Why named functors significantly reduce template instantiation count
- Which specific lambda patterns each functor replaces

This documentation helps maintainers understand how named functors reduce
build-time overhead in tensor_descriptor operations.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.

Changes:

  • Added generate_identity_sequences<N>() helper to generate Tuple<Sequence<0>, ..., Sequence<N-1>> without lambdas.
  • Added named sequence utilities (merge_sequences_functor, unpack_and_merge_sequences) and replaced lambdas in transform_tensor_descriptor/TensorDescriptor logic.
  • Updated multiple call sites to use the new helper(s) and added unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/util/unit_sequence_helper.cpp Adds unit tests for generate_identity_sequences and unpack_and_merge_sequences.
test/util/CMakeLists.txt Adds a new gtest executable target for the new unit tests.
include/ck/wrapper/utils/tensor_partition.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/utils/layout_utils.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/tensor.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/operations/gemm.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/layout.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/utility/tuple_helper.hpp Introduces generate_identity_sequences helper implementation.
include/ck/utility/sequence_helper.hpp Introduces named functors and unpack_and_merge_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/device/matrix_padder.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_description/tensor_descriptor.hpp Replaces lambdas with named functors and uses unpack_and_merge_sequences.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +55
auto input = make_tuple(Sequence<10, 20, 30>{});
auto result = unpack_and_merge_sequences{}(input);
auto expected = Sequence<10, 20, 30>{};
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile here either. Call unpack_and_merge_sequences(input) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +63
auto input = make_tuple(Sequence<100>{}, Sequence<200, 300>{});
auto result = unpack_and_merge_sequences{}(input);
auto expected = Sequence<100, 200, 300>{};
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue: unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile. Call unpack_and_merge_sequences(input) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +74
template <typename TupleOfSequences>
__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
{
return unpack(merge_sequences_functor{}, TupleOfSequences{});
}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpack_and_merge_sequences calls unpack(...), but this header only includes ck/utility/tuple.hpp and does not include the header that defines unpack (ck/utility/functional4.hpp). This breaks includes that pull in sequence_helper.hpp before functional4.hpp (e.g. include/ck/utility/container_helper.hpp includes sequence_helper.hpp before tuple_helper.hpp). Add the proper include here (or otherwise ensure unpack is declared).

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.

Comment on lines +71 to +73
__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
{
return unpack(merge_sequences_functor{}, TupleOfSequences{});
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpack_and_merge_sequences ignores its argument and always default-constructs TupleOfSequences{}. That’s surprising and will fail for non-default-constructible tuple-like types. Use the passed tuple value (e.g., forward it into unpack) so the helper behaves like its name suggests.

Suggested change
__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
{
return unpack(merge_sequences_functor{}, TupleOfSequences{});
__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences tuple_of_sequences)
{
return unpack(merge_sequences_functor{}, tuple_of_sequences);

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +39
EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value));
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple::At(Number<I>) returns a reference; for a constexpr auto result this is typically const Sequence<I>&. These assertions currently compare against const Sequence<I> (non-reference), so they will fail. Update the expected types (or strip references in the comparison).

Suggested change
EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>&>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>&>::value));
EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>&>::value));

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +47
auto input = make_tuple(Sequence<1, 2>{}, Sequence<3, 4>{}, Sequence<5, 6>{});
auto result = unpack_and_merge_sequences{}(input);
auto expected = Sequence<1, 2, 3, 4, 5, 6>{};
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpack_and_merge_sequences is defined as a function template, not a callable object type. unpack_and_merge_sequences{}(input) will not compile; call it as unpack_and_merge_sequences(input) (or change the helper to a functor if that was the intent).

Copilot uses AI. Check for mistakes.
@cgmillette cgmillette self-assigned this Jan 23, 2026

// convert_visible_ids_to_hidden_ids - maps sequence of visible IDs to hidden IDs
//
// Replaces: [&](auto low_dim_visible_ids) { return transform_sequences(convert_fn, ids); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment about the lambda it replaces is obvious and not helpful going forward (since the lambda isn't around anymore).


// generate_arithmetic_sequence_from_scan - generates arithmetic sequences for upper dimensions
//
// Replaces lambda: [&](auto i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly: this "Replaces ..." comment is an artifact of the refactoring and should be removed.

Comment on lines +70 to +74
template <typename TupleOfSequences>
__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
{
return unpack(merge_sequences_functor{}, TupleOfSequences{});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.

// - Pack expansion: make_tuple(Sequence<Is>{}...) creates all sequences at once
// - No lambda closures or unique types per call site
//
// Impact:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an artifact of the refactoring and shouldn't be included as a file comment. (It's very useful for the PR description, but it will get confusing if it's left in the code base.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants