-
Notifications
You must be signed in to change notification settings - Fork 269
Refactor vector type to reduce build times #3641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors the vector_type implementation to reduce build times by consolidating specialized template implementations into a generalized design. The changes aim to improve frontend parsing times by reducing redundant code and backend codegen times by replacing recursive template instantiations with concrete implementations.
Changes:
- Generalizes vector_type partial specializations into a single class with helper structs (vector_type_storage, non_native_vector_base)
- Replaces recursive StaticallyIndexedArray with concrete StaticallyIndexedArray_v2
- Fixes bool datatype handling with vector_type to avoid data slicing issues
- Adds new math utilities: integer_log2_floor and is_power_of_two_integer
- Updates next_power_of_two to handle edge cases (X <= 1)
- Introduces default scalar_type template specialization with unsigned _BitInt fallback
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| include/ck/utility/statically_indexed_array.hpp | Adds blank line (cosmetic change) |
| include/ck/utility/math.hpp | Fixes next_power_of_two for edge cases, adds integer_log2_floor and is_power_of_two_integer functions |
| include/ck/utility/data_type.hpp | Refactors scalar_type with default template and typename keywords, removes redundant next_pow2 |
| include/ck/utility/dtype_vector.hpp | Major refactor: introduces NativeVectorT alias, vector_type_storage helper, generalized vector_type class with is_as_type_cast_valid validation, consolidates non_native_vector_base specializations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| template <typename T> | ||
| struct scalar_type | ||
| { | ||
| // Basic data type mapping to unsigned _BitInt of appropriate size | ||
| using type = unsigned _BitInt(8 * sizeof(T)); | ||
| static constexpr index_t vector_size = 1; | ||
| }; |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of a default template specialization for scalar_type that uses unsigned _BitInt(8 * sizeof(T)) is a significant change. While this provides a sensible fallback for types without explicit specializations, it may cause issues with certain types. For example, pointer types, function types, or complex class types would get this BitInt mapping which may not be the intended behavior. The previous design likely intentionally required explicit specializations to ensure only valid types were used. Consider either documenting this behavior more explicitly or adding static assertions to catch problematic type usages at compile time.
| // TODO: x need to be 1 ~ 0x7fffffff | ||
| // __builtin_clz will produce unexpected result if x is 0; | ||
| return 31 - __builtin_clz(x); | ||
| } | ||
|
|
||
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | ||
| { | ||
| // TODO: x need to be 1 ~ 0x7fffffff | ||
| return x == (1 << integer_log2_floor(x)); |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_power_of_two_integer function has a potential issue when x equals 1. When x is 1, integer_log2_floor(1) returns 0 (since 31 - __builtin_clz(1) = 31 - 31 = 0), and then (1 << 0) = 1, so the function correctly returns true. However, this edge case behavior should be verified. Additionally, the function can return incorrect results for x <= 0 per the TODO on line 244, but there's no runtime check to prevent such inputs. Consider adding a static_assert or runtime check if this function is called with compile-time constant expressions.
| // TODO: x need to be 1 ~ 0x7fffffff | |
| // __builtin_clz will produce unexpected result if x is 0; | |
| return 31 - __builtin_clz(x); | |
| } | |
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | |
| { | |
| // TODO: x need to be 1 ~ 0x7fffffff | |
| return x == (1 << integer_log2_floor(x)); | |
| // x needs to be in the range [1, 0x7fffffff]. | |
| // __builtin_clz is undefined for x <= 0; guard against that here. | |
| return (x > 0) ? (31 - __builtin_clz(static_cast<uint32_t>(x))) : int32_t{-1}; | |
| } | |
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | |
| { | |
| // x needs to be in the range [1, 0x7fffffff]. For x <= 0, this is not a power of two. | |
| return (x > 0) && (x == (int32_t(1) << integer_log2_floor(x))); |
Co-authored-by: Copilot <[email protected]>
…ype is visible where needed
Proposed changes
Build times can be affected by many different things and is highly attributed to the way we write and use the code. Two critical areas of the builds are frontend parsing and backend codegen and compilation.
Frontend Parsing
The length of the code, the include header tree and macro expansions all affect the front-end parsing time.
This PR seeks to reduce the parsing time of the dtype_vector.hpp vector_type class by reducing redundant code by generalization.
Backend Codegen
Template instantiation behavior can also affect build times. Recursive instantiations are very slow versus concrete instantiations. The compiler must make multiple passes to expand template instantiations so we need to be careful about how they are used.
This union storage has been removed from the vector_type storage class.
Fixes
Additions
Build Time Analysis
Machine: banff-cyxtera-s78-2
Target: gfx942
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered