Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 26, 2025

Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 26, 2025

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3554

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Dec 26, 2025
Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 7bac41c
ghstack-comment-id: 3693220218
Pull-Request: #3554
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 26, 2025
@vkuzo vkuzo added the topic: deprecation Use this tag if this PR deprecates a feature label Dec 26, 2025
[ghstack-poisoned]
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 26, 2025
Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: ddcc172
ghstack-comment-id: 3693220218
Pull-Request: #3554
@liangan1
Copy link
Collaborator

@vkuzo is there any plan to add the new design for moe quant?

@liangan1
Copy link
Collaborator

liangan1 commented Jan 5, 2026

cc @xiaowangintel

@vkuzo
Copy link
Contributor Author

vkuzo commented Jan 5, 2026

@vkuzo is there any plan to add the new design for moe quant?

Yes

  1. we don't currently plan to provide any "MoE modules" (hence this PR which deletes "MoE modules")
  2. we do plan to improve PyTorch Core MoE support (such as torch._grouped_mm) and support quantization as needed. So, if user writes their MoE with torch._grouped_mm, torchao plans to support quantizing that to torch._scaled_grouped_mm.
  3. we have tentative plans to improve the integration of torchao checkpoints into vLLM for MoE models, using vLLM's MoE kernels

vkuzo added 2 commits January 5, 2026 10:24
[ghstack-poisoned]
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Jan 5, 2026
Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: aec80dc
ghstack-comment-id: 3693220218
Pull-Request: #3554
vkuzo added 2 commits January 5, 2026 13:50
[ghstack-poisoned]
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Jan 5, 2026
Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: a2e8174
ghstack-comment-id: 3693220218
Pull-Request: #3554
@liangan1
Copy link
Collaborator

liangan1 commented Jan 6, 2026

@vkuzo is there any plan to add the new design for moe quant?

Yes

  1. we don't currently plan to provide any "MoE modules" (hence this PR which deletes "MoE modules")
  2. we do plan to improve PyTorch Core MoE support (such as torch._grouped_mm) and support quantization as needed. So, if user writes their MoE with torch._grouped_mm, torchao plans to support quantizing that to torch._scaled_grouped_mm.
  3. we have tentative plans to improve the integration of torchao checkpoints into vLLM for MoE models, using vLLM's MoE kernels

Thanks for your info. But I suppose the item 2 should only works for the fp8/fp4, how about int4 and int8? besides, most of the MoE model definition in the HF/transformers is not based on the grouped_mm, is there any plan to extend the torch._grouped_mm adoption scope in the HF/transformers?

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Jan 6, 2026
Summary:

This is not used, removing.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 1f95814
ghstack-comment-id: 3693220218
Pull-Request: #3554
@vkuzo vkuzo changed the base branch from gh/vkuzo/193/head to main January 6, 2026 11:24
@vkuzo vkuzo merged commit dd41e98 into main Jan 6, 2026
42 of 54 checks passed
@vkuzo
Copy link
Contributor Author

vkuzo commented Jan 6, 2026

But I suppose the item 2 should only works for the fp8/fp4, how about int4 and int8?

int4 and int8 could be supported as well, whether the kernel lives in Core, torchao or somewhere else can be case by case in the short term

most of the MoE model definition in the HF/transformers is not based on the grouped_mm

yes, the MoE authoring story is very fragmented. Long term, we want PyTorch core to have the right primitives to make MoE authoring easy, and in torchao have a story to easily quantize them. Short term, we may have to have case-by-case workarounds. We do plan to work on adoption of grouped_mm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: deprecation Use this tag if this PR deprecates a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants