Deprecate float32 bias for Cutlass FP8 rowwise #4274
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Currently for each configuration (MNK tile shapes & cluster shapes) of Cutlass FP8 rowwise we will compile 12 kernels, as we further template on (FAST_ACCUM, USE_BIAS, INPUT_DTYPE, BIAS_DTYPE). We don't seem to have any use-cases using float32 BIAS_DTYPE, so it can likely be removed, making it only 8 kernels instead of 12 per configuration.
INPUT_DTYPE could likely be removed as well, but might need some more cleanup/verification to be done safely, as I see some scripts using it, so its a bit more work.
Differential Revision: D76063973