You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in the examples, W4A16 quantization is provided specifically for multimodal models, while Int8 W8A8 quantization examples are only available for LLM. These examples use SmoothQuantModifier and GPTQModifier during quantization. Therefore, I would like to know: for multimodal models, such as QwenVL2.5, is SmoothQuantModifier necessary when performing W8A8 quantization?
The text was updated successfully, but these errors were encountered:
Hi @weirdo2310! SmoothQuantModifier implements the SmoothQuant algorithm. This algorithm has been shown to improve accuracy recovery for W8A8 schemes, regardless of model architecture (multimodal or not). Therefore, we recommend using SmoothQuantModifier when quantizing W8A8 schemes, but it is not required.
Hi @weirdo2310! SmoothQuantModifier implements the SmoothQuant algorithm. This algorithm has been shown to improve accuracy recovery for W8A8 schemes, regardless of model architecture (multimodal or not). Therefore, we recommend using SmoothQuantModifier when quantizing W8A8 schemes, but it is not required.
Does SmoothQuantModifier improve accuracy recovery for W8A8.FP8-Dynamic ?
I don't find any usage of SmoothQuantModifier in FP8 recipes.
I noticed that in the examples, W4A16 quantization is provided specifically for multimodal models, while Int8 W8A8 quantization examples are only available for LLM. These examples use SmoothQuantModifier and GPTQModifier during quantization. Therefore, I would like to know: for multimodal models, such as QwenVL2.5, is SmoothQuantModifier necessary when performing W8A8 quantization?
The text was updated successfully, but these errors were encountered: