Skip to content

Conversation

@aidando73
Copy link
Contributor

@aidando73 aidando73 commented Jan 10, 2026

I'm getting this error trying to generate e4m3 fp8 kernels for SM100:

/home/aidan/fireworks/cutlass/include/cute/int_tuple.hpp(408): error: static assertion failed with "Divisibility Condition"
        static_assert(((IntTupleA::value % IntTupleB::value) == 0) or ((IntTupleB::value % IntTupleA::value) == 0), "Divisibility Condition");
        ^
          detected during:
            instantiation of "auto cute::shape_div(const IntTupleA &, const IntTupleB &) [with IntTupleA=cute::_136, IntTupleB=cute::_64]" at line 391
            instantiation of function "lambda [](const auto &, const auto &)->auto [with <auto-1>=cute::_136, <auto-2>=cute::_64]" at line 109 of /home/aidan/fireworks/cutlass/include/cute/algorithm/tuple_algorithms.hpp
            instantiation of "auto cute::detail::tapply(T0 &&, T1 &&, F &&, G &&, cute::seq<I...>) [with T0=const cute::tuple<cute::_64, cute::_136> &, T1=const cute::tuple<cute::C<64>, cute::C<64>> &, F=lambda [](const auto &, const auto &)->auto &, G=lambda [](const auto &...)->auto, I=<0, 1>]" at line 225 of /home/aidan/fireworks/cutlass/include/cute/algorithm/tuple_algorithms.hpp
            instantiation of "auto cute::transform(const T0 &, const T1 &, F &&) [with T0=cute::tuple<cute::_64, cute::_136>, T1=cute::tuple<cute::C<64>, cute::C<64>>, F=lambda [](const auto &, const auto &)->auto]" at line 391
            instantiation of "auto cute::shape_div(const IntTupleA &, const IntTupleB &) [with IntTupleA=cute::tuple<cute::_64, cute::_136>, IntTupleB=cute::tuple<cute::C<64>, cute::C<64>>]" at line 1499 of /home/aidan/fireworks/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl
            instantiation of class "cutlass::epilogue::collective::CollectiveBuilder<cutlass::arch::Sm100, OpClass, MmaTileShape_MNK, ClusterShape_MNK, EpilogueTileType, ElementAccumulator, ElementCompute, ElementC_, GmemLayoutTagC_, AlignmentC, ElementD, GmemLayoutTagD, AlignmentD, EpilogueScheduleType, FusionOpOrCallbacks, std::enable_if_t<<expression>, void>> [with OpClass=cutlass::arch::OpClassTensorOp, MmaTileShape_MNK=cute::tuple<cute::_64, cute::_136, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>, EpilogueTileType=cutlass::epilogue::collective::EpilogueTileAuto, ElementAccumulator=float, ElementCompute=float, ElementC_=void, GmemLayoutTagC_=cutlass::layout::ColumnMajor, AlignmentC=1, ElementD=cutlass::bfloat16_t, GmemLayoutTagD=cutlass::layout::ColumnMajor, AlignmentD=1, EpilogueScheduleType=cutlass::epilogue::NoSmemWarpSpecialized1Sm, FusionOpOrCallbacks=cutlass::epilogue::fusion::LinearCombination<cutlass::bfloat16_t, float, void, float, cutlass::FloatRoundStyle::round_to_nearest>]" at line 1501 of /home/aidan/fireworks/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl
            instantiation of class "cutlass::epilogue::collective::CollectiveBuilder<cutlass::arch::Sm100, OpClass, MmaTileShape_MNK, ClusterShape_MNK, EpilogueTileType, ElementAccumulator, ElementCompute, ElementC_, GmemLayoutTagC_, AlignmentC, ElementD, GmemLayoutTagD, AlignmentD, EpilogueScheduleType, FusionOpOrCallbacks, std::enable_if_t<<expression>, void>> [with OpClass=cutlass::arch::OpClassTensorOp, MmaTileShape_MNK=cute::tuple<cute::_64, cute::_136, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>, EpilogueTileType=cutlass::epilogue::collective::EpilogueTileAuto, ElementAccumulator=float, ElementCompute=float, ElementC_=void, GmemLayoutTagC_=cutlass::layout::ColumnMajor, AlignmentC=1, ElementD=cutlass::bfloat16_t, GmemLayoutTagD=cutlass::layout::ColumnMajor, AlignmentD=1, EpilogueScheduleType=cutlass::epilogue::NoSmemWarpSpecialized1Sm, FusionOpOrCallbacks=cutlass::epilogue::fusion::LinearCombination<cutlass::bfloat16_t, float, void, float, cutlass::FloatRoundStyle::round_to_nearest>]" at line 46 of /home/aidan/fireworks/tools/library/generated/gemm/100/void_gemm_e4m3/cutlass3x_sm100_tensorop_gemm_e4m3_e4m3_f32_void_bf16_64x136x128_1x1x1_0_tnn_align8_cpasync_1sm_epi_nosmem.cu

This PR fixes it - relevant line is here: https://github.com/aidando73/cutlass-1/blob/a1dfe3f4935d80726c95bae8a56c2a2c5280e73d/include/cutlass/epilogue/collective/builders/sm100_builder.inl#L1499

E.g., if CtaTileShape_MNK: (64, 136, _) and EpilogueTile: (64, 64) then this assert fails:

https://github.com/aidando73/cutlass-1/blob/a1dfe3f4935d80726c95bae8a56c2a2c5280e73d/include/cute/int_tuple.hpp#L408

And since EpilogueTile[1] is min(64, cta_n) - there's two cases:

  • If cta_n <= 64, EpilogueTile[1] == CtaTileShape_MNK[1] -> assert passes
  • If cta_n > 64, EpilogueTile[1]=64, thus CtaTileShape_MNK[1] must be divisible by 64

@aidando73
Copy link
Contributor Author

cc @hwu36 @depaulmillz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant