Skip to content

For multimodal models, such as QwenVL2.5, is the SmoothQuantModifier necessary when performing W8A8 quantization? #1394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
weirdo2310 opened this issue Apr 28, 2025 · 2 comments
Assignees

Comments

@weirdo2310
Copy link

I noticed that in the examples, W4A16 quantization is provided specifically for multimodal models, while Int8 W8A8 quantization examples are only available for LLM. These examples use SmoothQuantModifier and GPTQModifier during quantization. Therefore, I would like to know: for multimodal models, such as QwenVL2.5, is SmoothQuantModifier necessary when performing W8A8 quantization?

@kylesayrs
Copy link
Collaborator

Hi @weirdo2310! SmoothQuantModifier implements the SmoothQuant algorithm. This algorithm has been shown to improve accuracy recovery for W8A8 schemes, regardless of model architecture (multimodal or not). Therefore, we recommend using SmoothQuantModifier when quantizing W8A8 schemes, but it is not required.

@kylesayrs kylesayrs self-assigned this Apr 28, 2025
@bash99
Copy link

bash99 commented May 4, 2025

Hi @weirdo2310! SmoothQuantModifier implements the SmoothQuant algorithm. This algorithm has been shown to improve accuracy recovery for W8A8 schemes, regardless of model architecture (multimodal or not). Therefore, we recommend using SmoothQuantModifier when quantizing W8A8 schemes, but it is not required.

Does SmoothQuantModifier improve accuracy recovery for W8A8.FP8-Dynamic ?
I don't find any usage of SmoothQuantModifier in FP8 recipes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants