Add generate_identity_sequences helper and replace lambdas with named functors #3628

tenpercent · 2026-01-22T00:17:42Z

Summary

Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
Replace lambdas with named functors in transform_tensor_descriptor
Add unpack_and_merge_sequences helper functor
Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

Replaces common lambda pattern for generating identity sequences
Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
Use generate_identity_sequences in matrix_padder.hpp

Test Plan

Added 7 unit tests:
- 4 tests for generate_identity_sequences
- 3 tests for unpack_and_merge_sequences
Waiting for full CI

Related PRs

This PR merges the functionality from:

Add generate_identity_sequences helper for common pattern #3588 (generate_identity_sequences helper)
Replace lambdas with named functors in transform_tensor_descriptor #3589 (Named functors in transform_tensor_descriptor)

Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)

Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.

This adds an optimized helper for the common generate_tuple pattern: generate_tuple([](auto i) { return Sequence<i.value>{}; }, N) The new generate_identity_sequences<N>() function creates Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>> without requiring lambda instantiation at each call site. Updated 21 call sites across threadwise_tensor_slice_transfer, wrapper utilities, and layout files to use the new helper. Build time improvement: ~1.1% wall-clock (18.3s -> 18.1s)

Lambda expressions in transform_tensor_descriptor created unique template instantiations for each capture combination. This change replaces lambdas with named functor structs to reduce instantiation count: - Add merge_sequences_functor and unpack_and_merge_sequences helper - Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids - Add generate_arithmetic_sequence_from_scan Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).

…equences

Detailed comments explain: - generate_identity_sequences: Replaces 21 lambda-based call sites - merge_sequences_functor and unpack_and_merge_sequences: Named functors vs lambdas - convert_visible_to_hidden_id and related functors: Eliminate nested lambda instantiations - Why named functors significantly reduce template instantiation count - Which specific lambda patterns each functor replaces This documentation helps maintainers understand how named functors reduce build-time overhead in tensor_descriptor operations.

Copilot

Pull request overview

This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.

Changes:

Added generate_identity_sequences<N>() helper to generate Tuple<Sequence<0>, ..., Sequence<N-1>> without lambdas.
Added named sequence utilities (merge_sequences_functor, unpack_and_merge_sequences) and replaced lambdas in transform_tensor_descriptor/TensorDescriptor logic.
Updated multiple call sites to use the new helper(s) and added unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
test/util/unit_sequence_helper.cpp	Adds unit tests for `generate_identity_sequences` and `unpack_and_merge_sequences`.
test/util/CMakeLists.txt	Adds a new gtest executable target for the new unit tests.
include/ck/wrapper/utils/tensor_partition.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/utils/layout_utils.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/tensor.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/operations/gemm.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/layout.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/utility/tuple_helper.hpp	Introduces `generate_identity_sequences` helper implementation.
include/ck/utility/sequence_helper.hpp	Introduces named functors and `unpack_and_merge_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/device/matrix_padder.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_description/tensor_descriptor.hpp	Replaces lambdas with named functors and uses `unpack_and_merge_sequences`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T22:14:16Z

test/util/unit_sequence_helper.cpp

+    auto input    = make_tuple(Sequence<10, 20, 30>{});
+    auto result   = unpack_and_merge_sequences{}(input);
+    auto expected = Sequence<10, 20, 30>{};


unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile here either. Call unpack_and_merge_sequences(input) instead.

Copilot · 2026-01-22T22:14:17Z

test/util/unit_sequence_helper.cpp

+    auto input    = make_tuple(Sequence<100>{}, Sequence<200, 300>{});
+    auto result   = unpack_and_merge_sequences{}(input);
+    auto expected = Sequence<100, 200, 300>{};


Same issue: unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile. Call unpack_and_merge_sequences(input) instead.

Copilot · 2026-01-22T22:14:17Z

include/ck/utility/sequence_helper.hpp

+template <typename TupleOfSequences>
+__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
+{
+    return unpack(merge_sequences_functor{}, TupleOfSequences{});
+}


unpack_and_merge_sequences calls unpack(...), but this header only includes ck/utility/tuple.hpp and does not include the header that defines unpack (ck/utility/functional4.hpp). This breaks includes that pull in sequence_helper.hpp before functional4.hpp (e.g. include/ck/utility/container_helper.hpp includes sequence_helper.hpp before tuple_helper.hpp). Add the proper include here (or otherwise ensure unpack is declared).

Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.

Copilot · 2026-01-22T22:14:17Z

include/ck/utility/sequence_helper.hpp

+__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
+{
+    return unpack(merge_sequences_functor{}, TupleOfSequences{});


unpack_and_merge_sequences ignores its argument and always default-constructs TupleOfSequences{}. That’s surprising and will fail for non-default-constructible tuple-like types. Use the passed tuple value (e.g., forward it into unpack) so the helper behaves like its name suggests.

Suggested change

__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)

{

return unpack(merge_sequences_functor{}, TupleOfSequences{});

__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences tuple_of_sequences)

{

return unpack(merge_sequences_functor{}, tuple_of_sequences);

Copilot · 2026-01-22T22:14:18Z

test/util/unit_sequence_helper.cpp

+    EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value));
+    EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value));
+    EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value));


Tuple::At(Number<I>) returns a reference; for a constexpr auto result this is typically const Sequence<I>&. These assertions currently compare against const Sequence<I> (non-reference), so they will fail. Update the expected types (or strip references in the comparison).

Suggested change

EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value));

EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value));

EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value));

EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>&>::value));

EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>&>::value));

EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>&>::value));

Copilot · 2026-01-22T22:14:18Z

test/util/unit_sequence_helper.cpp

+    auto input    = make_tuple(Sequence<1, 2>{}, Sequence<3, 4>{}, Sequence<5, 6>{});
+    auto result   = unpack_and_merge_sequences{}(input);
+    auto expected = Sequence<1, 2, 3, 4, 5, 6>{};


unpack_and_merge_sequences is defined as a function template, not a callable object type. unpack_and_merge_sequences{}(input) will not compile; call it as unpack_and_merge_sequences(input) (or change the helper to a functor if that was the intent).

shumway · 2026-01-23T23:50:45Z

include/ck/tensor_description/tensor_descriptor.hpp

+
+// convert_visible_ids_to_hidden_ids - maps sequence of visible IDs to hidden IDs
+//
+// Replaces: [&](auto low_dim_visible_ids) { return transform_sequences(convert_fn, ids); }


This comment about the lambda it replaces is obvious and not helpful going forward (since the lambda isn't around anymore).

shumway · 2026-01-23T23:51:46Z

include/ck/tensor_description/tensor_descriptor.hpp

+
+// generate_arithmetic_sequence_from_scan - generates arithmetic sequences for upper dimensions
+//
+// Replaces lambda: [&](auto i) {


Similarly: this "Replaces ..." comment is an artifact of the refactoring and should be removed.

shumway · 2026-01-23T23:53:46Z

include/ck/utility/sequence_helper.hpp

+template <typename TupleOfSequences>
+__host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences)
+{
+    return unpack(merge_sequences_functor{}, TupleOfSequences{});
+}


Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.

shumway · 2026-01-23T23:55:34Z

include/ck/utility/tuple_helper.hpp

+// - Pack expansion: make_tuple(Sequence<Is>{}...) creates all sequences at once
+// - No lambda closures or unique types per call site
+//
+// Impact:


This is an artifact of the refactoring and shouldn't be included as a file comment. (It's very useful for the PR description, but it will get confusing if it's left in the code base.)

tenpercent added 3 commits January 21, 2026 23:56

Add unit tests for generate_identity_sequences and unpack_and_merge_s…

905ae13

…equences

tenpercent added 2 commits January 22, 2026 00:22

Apply clang-format to unit_sequence_helper.cpp

84a09ef

tenpercent marked this pull request as ready for review January 22, 2026 03:10

tenpercent requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, vidyasagar-amd and vpietila-amd as code owners January 22, 2026 03:10

tenpercent marked this pull request as draft January 22, 2026 18:48

vidyasagar-amd requested a review from Copilot January 22, 2026 22:05

Copilot started reviewing on behalf of vidyasagar-amd January 22, 2026 22:06 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

cgmillette self-assigned this Jan 23, 2026

shumway reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generate_identity_sequences helper and replace lambdas with named functors #3628

Add generate_identity_sequences helper and replace lambdas with named functors #3628

Uh oh!

tenpercent commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

shumway Jan 23, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

shumway Jan 23, 2026

Uh oh!

shumway Jan 23, 2026

Uh oh!

shumway Jan 23, 2026

Uh oh!

shumway Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add generate_identity_sequences helper and replace lambdas with named functors #3628

Are you sure you want to change the base?

Add generate_identity_sequences helper and replace lambdas with named functors #3628

Uh oh!

Conversation

tenpercent commented Jan 22, 2026

Summary

Motivation

Changes

Part 1: generate_identity_sequences helper

Part 2: Named functors in transform_tensor_descriptor

Test Plan

Related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

shumway Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

shumway Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

shumway Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

shumway Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

shumway Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants