Fix int_nbit int8 nobag CUDA kernel #4421
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1491
TLDR;
Fix int8 nobag in TBE inference CUDA kernel such that
Detail
For nobag int8, the output shape should be
{total_L, D + kINT8QparamsBytes}
, sincetotal_L
dimension already includesT
.T *
was unintentionally added in D36018114.kINT8QparamsBytes
is 4 in CPU, since a half is used. However, 8 is used in CUDA.This diff removes
T*
from the output shape and changekINT8QparamsBytes
to be 4 for CUDA kernel implementation to match CPU and production.There has been no issue because both our int8 nobag CUDA kernels are not currently used in production.
Note that this is currently used meta function is fbgemm_int_nbit_split_embedding_codegen_lookup_function_meta, which has different logic for int8 and nobag cases.
The discrepancy has not been an issue because:
-> The embedding dimensions are the same, so average D = max D.
-> This is not being used in prod
This will be a problem if embedding dimensions are mixed, or int8 pooled is going to be used.
Differential Revision: D76488339