-
Notifications
You must be signed in to change notification settings - Fork 393
remove torchao/prototype/moe_quant
#3554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3554
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@vkuzo is there any plan to add the new design for moe quant? |
Yes
|
Thanks for your info. But I suppose the item 2 should only works for the fp8/fp4, how about int4 and int8? besides, most of the MoE model definition in the HF/transformers is not based on the grouped_mm, is there any plan to extend the torch._grouped_mm adoption scope in the HF/transformers? |
int4 and int8 could be supported as well, whether the kernel lives in Core, torchao or somewhere else can be case by case in the short term
yes, the MoE authoring story is very fragmented. Long term, we want PyTorch core to have the right primitives to make MoE authoring easy, and in torchao have a story to easily quantize them. Short term, we may have to have case-by-case workarounds. We do plan to work on adoption of grouped_mm. |
Summary:
This is not used, removing.
Test Plan: CI
Reviewers:
Subscribers:
Tasks:
Tags: