Skip to content

add load checkpoint support for virtual table #4250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bobbyliujb
Copy link

Summary:
after all of the rebasing and landing, the trunk still missed some of the needed changes for checkpoint loading:

  • change create_virtual_table_global_metadata to respect local_weight_count on each rank, or just use the param size as number of rows on each rank
  • register register_load_state_dict_post_hook in ShardedEmbeddingCollection to let it ignore loading the weight tensor

Differential Revision: D75843542

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75843542

Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for pytorch-fbgemm-docs failed.

Name Link
🔨 Latest commit 504dec1
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/683f6e9333afae000831661e

Summary:

after all of the rebasing and landing, the trunk still missed some of the needed changes for checkpoint loading:
* change `create_virtual_table_global_metadata` to respect local_weight_count on each rank, or just use the param size as number of rows on each rank
* register register_load_state_dict_post_hook in ShardedEmbeddingCollection to let it ignore loading the weight tensor

Differential Revision:
D75843542

Privacy Context Container: L1138451
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75843542

bobbyliujb pushed a commit to bobbyliujb/torchrec that referenced this pull request Jun 3, 2025
Summary:
X-link: pytorch/FBGEMM#4250

after all of the rebasing and landing, the trunk still missed some of the needed changes for checkpoint loading:
* change `create_virtual_table_global_metadata` to respect local_weight_count on each rank, or just use the param size as number of rows on each rank
* register register_load_state_dict_post_hook in ShardedEmbeddingCollection to let it ignore loading the weight tensor

Differential Revision:
D75843542

Privacy Context Container: L1138451
facebook-github-bot pushed a commit to pytorch/torchrec that referenced this pull request Jun 4, 2025
Summary:
Pull Request resolved: #3037

X-link: facebookresearch/FBGEMM#1329

X-link: pytorch/FBGEMM#4250

after all of the rebasing and landing, the trunk still missed some of the needed changes for checkpoint loading:
* change `create_virtual_table_global_metadata` to respect local_weight_count on each rank, or just use the param size as number of rows on each rank
* register register_load_state_dict_post_hook in ShardedEmbeddingCollection to let it ignore loading the weight tensor

Reviewed By: emlin

Differential Revision:
D75843542

Privacy Context Container: L1138451

fbshipit-source-id: 8b3c8d76bb2e7ba2137c8899de2c03d534f1365c
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ee0264c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants