Precision Issues with GPTQ-Quantized Qwen2.5-VL Model

After applying GPTQ quantization to the Qwen2.5-VL model and running it on the test set, I observed a drop in accuracy. Are there any methods to help identify or locate the source of this accuracy degradation?

In this situation, should we be trying to **locate where exactly the accuracy issue is introduced**, or is this kind of degradation **expected/normal** after quantization?

**Environment**
Include all relevant environment information:
1. Ubuntu22.04
2. Python version:3.10.16
3. LLM Compressor version: v0.6.0
4. torch version: 2.6.0+cu124
5. vLLM version: 0.8.5
6. CUDA version: 12.8
7. transformers version: 4.52.4
8. GPU: nvidia H20

```
recipe = [
    GPTQModifier(
        targets="Linear",
        scheme="W4A16",
        sequential_targets=["Qwen2_5_VLDecoderLayer"],
        ignore=["lm_head", "re:visual.*"],
    ),
]
``` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precision Issues with GPTQ-Quantized Qwen2.5-VL Model #1629

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Precision Issues with GPTQ-Quantized Qwen2.5-VL Model #1629

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions