Skip to content

Support ssd device propagation in Torch Rec for RecSys Inference #2961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

faran928
Copy link
Contributor

@faran928 faran928 commented May 8, 2025

Summary:
For RecSys Inference when tables are offloaded onto SSD:

  1. Specify and propagate the tables to be offloaded to SSD in TorchRec via FUSED_PARAMS
  2. Continue using torch.device("cpu") as compute device while using separate input / output dist for SSD (as SSD kernel - EmbeddingDB is different than CPU kernel) by creating a new device group for SSD.

Would be renaming device_type_from_sharding_info to storage_device_type_from_sharding_info to clarify it better.

Differential Revision: D74378974

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 8, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74378974

Summary:
For RecSys Inference when tables are offloaded onto SSD:

1. Specify and propagate the tables to be offloaded to SSD in TorchRec via FUSED_PARAMS as discussed with TroyGarden
2. Continue using torch.device("cpu") as compute device while using separate input / output dist for SSD (as in house SSD TBE kernel based on EmbeddingDB is different than CPU TBE kernel) by creating a new device group for SSD.

Would be renaming device_type_from_sharding_info to storage_device_type_from_sharding_info to clarify it better.

Reviewed By: jingsh

Differential Revision: D74378974
@faran928 faran928 force-pushed the export-D74378974 branch from abbc3e4 to 0559aff Compare May 9, 2025 17:03
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D74378974

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants