[Cutlass profiler] Fix SM100 FP8 nosmem epilogue shape_div “Divisibility Condition” for non‑multiple‑of‑64 N tiles#2946
Conversation
|
Hello @aidando73, good job. if you enable all mma instruction size by specifying cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl Lines 1008 to 1038 in 8debf77 So It not reasonable to ignore cta_n % 64 != 0 directly for other data dtype. It may cause other data dtype kernel instantiation failed.
A good solution is to add a new cutlass/python/cutlass_library/generator.py Lines 7603 to 7606 in 8debf77 |
|
@CalebDu thanks for the review
Ok updated - only going to apply to C=void D=bf16/f16 for now. Tested with: I believe we'll need this for C=bf16/f16 as well - but I run into a different error: So I will keep it scoped only to C=void for now and revisit this later. |
|
@Junkai-Wu @hwu36 LGTM. |
I'm getting this error trying to generate e4m3 fp8 kernels for SM100:
This PR fixes it - relevant line is here: https://github.com/aidando73/cutlass-1/blob/a1dfe3f4935d80726c95bae8a56c2a2c5280e73d/include/cutlass/epilogue/collective/builders/sm100_builder.inl#L1499
E.g., if CtaTileShape_MNK: (64, 136, _) and EpilogueTile: (64, 64) then this assert fails:
https://github.com/aidando73/cutlass-1/blob/a1dfe3f4935d80726c95bae8a56c2a2c5280e73d/include/cute/int_tuple.hpp#L408
And since EpilogueTile[1] is min(64, cta_n) - there's two cases:
Repro command: