Skip to content

For FP8 Fused MoE layers, only per-tensor scalesfor weights and activations are supporte? #1393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shuxiaobo opened this issue Apr 28, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@shuxiaobo
Copy link

shuxiaobo commented Apr 28, 2025

Describe the bug
A clear and concise description of what the bug is.
quantlization a FP8-dynamic moe model by https://docs.vllm.ai/en/latest/features/quantization/fp8.html
but vllm 0.7.3 can not load it

ValueError: For FP8 Fused MoE layers, only per-tensor scalesfor weights and activations are supported. Found num_bits=8 type='float' symmetric=True group_size=None strategy='channel' block_structure=None dynamic=False actorder=None observer='minmax' observer_kwargs={}, num_bits=8 type='float' symmetric=True group_size=None strategy='token' block_structure=None dynamic=True actorder=None observer=None observer_kwargs={}

@shuxiaobo shuxiaobo added the bug Something isn't working label Apr 28, 2025
@brian-dellabetta
Copy link
Collaborator

Hi @shuxiaobo , vllm only supports a subset of all the compression configurations possible, particularly for MoE layers. The latest version (0.8.4) would have better support, but I'm not sure if it would support this particular use case. If not, you can switch to strategy="tensor" instead of "channel" / "token", or open a feature request in https://github.com/vllm-project/vllm

@brian-dellabetta brian-dellabetta self-assigned this Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants