Fix int_nbit int8 nobag CUDA kernel #4421

spcyppt · 2025-07-01T02:57:00Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1491

TLDR;

Fix int8 nobag in TBE inference CUDA kernel such that

output shape is {total_L, D + kINT8QparamsBytes}
kINT8QparamsBytes = 4

Detail
For nobag int8, the output shape should be {total_L, D + kINT8QparamsBytes}, since total_L dimension already includes T. T * was unintentionally added in D36018114.
kINT8QparamsBytes is 4 in CPU, since a half is used. However, 8 is used in CUDA.

This diff removes T* from the output shape and change kINT8QparamsBytes to be 4 for CUDA kernel implementation to match CPU and production.

There has been no issue because both our int8 nobag CUDA kernels are not currently used in production.

Note that this is currently used meta function is fbgemm_int_nbit_split_embedding_codegen_lookup_function_meta, which has different logic for int8 and nobag cases.

The discrepancy has not been an issue because:

Nobag
- split_dispatcher: D = average D
- FBGEMM: D = max(max_D of each dtype)

-> The embedding dimensions are the same, so average D = max D.

Int8 Pooled
- split_dispatcher: [B, total_D] here
- FBGEMM: [B, total_D + T * 8]

-> This is not being used in prod

This will be a problem if embedding dimensions are mixed, or int8 pooled is going to be used.

Differential Revision: D76488339

Summary: X-link: facebookresearch/FBGEMM#1491 **TLDR;** Fix int8 nobag in TBE inference CUDA kernel such that - output shape is {total_L, D + kINT8QparamsBytes} - kINT8QparamsBytes = 4 **Detail** For nobag int8, the output shape should be `{total_L, D + kINT8QparamsBytes}`, since `total_L` dimension already includes `T`. `T *` was unintentionally added in D36018114. `kINT8QparamsBytes` is 4 in CPU, since a half is used. However, 8 is used in CUDA. This diff removes `T*` from the output shape and change `kINT8QparamsBytes` to be 4 for CUDA kernel implementation to match CPU and production. There has been no issue because both our int8 nobag CUDA kernels are not currently used in production. ---- Note that this is currently used meta function is [fbgemm_int_nbit_split_embedding_codegen_lookup_function_meta](https://www.internalfb.com/code/fbsource/[d4f61c30f747f0a8c2e6d806904bc8ef3ee5ea42]/fbcode/caffe2/torch/fb/model_transform/splitting/split_dispatcher.py?lines=231%2C423), which has different logic for int8 and nobag cases. The discrepancy has not been an issue because: - Nobag - split_dispatcher: D = average D - FBGEMM: D = max(max_D of each dtype) -> The embedding dimensions are the same, so average D = max D. - Int8 Pooled - split_dispatcher: [B, total_D] here - FBGEMM: [B, total_D + T * 8] -> This is not being used in prod This will be a problem if embedding dimensions are mixed, or int8 pooled is going to be used. Differential Revision: D76488339

netlify · 2025-07-01T02:57:04Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`f14041b`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68634e7f6af4e60008228ae0
😎 Deploy Preview	https://deploy-preview-4421--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-07-01T02:57:10Z

This pull request was exported from Phabricator. Differential Revision: D76488339

facebook-github-bot · 2025-07-01T06:14:04Z

This pull request has been merged in 8325430.

facebook-github-bot added the cla signed label Jul 1, 2025

facebook-github-bot added the fb-exported label Jul 1, 2025

facebook-github-bot closed this in 8325430 Jul 1, 2025

facebook-github-bot added the Merged label Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix int_nbit int8 nobag CUDA kernel #4421

Fix int_nbit int8 nobag CUDA kernel #4421

Uh oh!

spcyppt commented Jul 1, 2025

Uh oh!

netlify bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

Uh oh!

Fix int_nbit int8 nobag CUDA kernel #4421

Fix int_nbit int8 nobag CUDA kernel #4421

Uh oh!

Conversation

spcyppt commented Jul 1, 2025

Uh oh!

netlify bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

facebook-github-bot commented Jul 1, 2025

Uh oh!

Uh oh!

netlify bot commented Jul 1, 2025 •

edited

Loading