Skip to content

Fix int_nbit int8 nobag CUDA kernel #4421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

spcyppt
Copy link
Contributor

@spcyppt spcyppt commented Jul 1, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1491

TLDR;

Fix int8 nobag in TBE inference CUDA kernel such that

  • output shape is {total_L, D + kINT8QparamsBytes}
  • kINT8QparamsBytes = 4

Detail
For nobag int8, the output shape should be {total_L, D + kINT8QparamsBytes}, since total_L dimension already includes T. T * was unintentionally added in D36018114.
kINT8QparamsBytes is 4 in CPU, since a half is used. However, 8 is used in CUDA.

This diff removes T* from the output shape and change kINT8QparamsBytes to be 4 for CUDA kernel implementation to match CPU and production.

There has been no issue because both our int8 nobag CUDA kernels are not currently used in production.


Note that this is currently used meta function is fbgemm_int_nbit_split_embedding_codegen_lookup_function_meta, which has different logic for int8 and nobag cases.

The discrepancy has not been an issue because:

  • Nobag
    • split_dispatcher: D = average D
    • FBGEMM: D = max(max_D of each dtype)

-> The embedding dimensions are the same, so average D = max D.

  • Int8 Pooled
    • split_dispatcher: [B, total_D] here
    • FBGEMM: [B, total_D + T * 8]

-> This is not being used in prod

This will be a problem if embedding dimensions are mixed, or int8 pooled is going to be used.

Differential Revision: D76488339

Summary:
X-link: facebookresearch/FBGEMM#1491

**TLDR;**

Fix int8 nobag in TBE inference CUDA kernel such that
- output shape is {total_L, D + kINT8QparamsBytes}
- kINT8QparamsBytes = 4

**Detail**
For nobag int8, the output shape should be `{total_L, D + kINT8QparamsBytes}`, since `total_L` dimension already includes `T`. `T *` was unintentionally added in D36018114.
`kINT8QparamsBytes` is 4 in CPU, since a half is used. However, 8 is used in CUDA.

This diff removes `T*` from the output shape and change `kINT8QparamsBytes` to be 4 for CUDA kernel implementation to match CPU and production.

There has been no issue because both our int8 nobag CUDA kernels are not currently used in production.

----

Note that this is currently used meta function is [fbgemm_int_nbit_split_embedding_codegen_lookup_function_meta](https://www.internalfb.com/code/fbsource/[d4f61c30f747f0a8c2e6d806904bc8ef3ee5ea42]/fbcode/caffe2/torch/fb/model_transform/splitting/split_dispatcher.py?lines=231%2C423), which has different logic for int8 and nobag cases.

The discrepancy has not been an issue because:

- Nobag
    - split_dispatcher: D = average D 
    - FBGEMM: D = max(max_D of each dtype) 

-> The embedding dimensions are the same, so average D = max D.

- Int8 Pooled
  - split_dispatcher: [B, total_D] here
  - FBGEMM: [B, total_D + T * 8]

-> This is not being used in prod

This will be a problem if embedding dimensions are mixed, or int8 pooled is going to be used.

Differential Revision: D76488339
Copy link

netlify bot commented Jul 1, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit f14041b
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68634e7f6af4e60008228ae0
😎 Deploy Preview https://deploy-preview-4421--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76488339

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8325430.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants