How to run AWQ-W4Afp8 quantization? #1368

wanzhenchn · 2025-04-22T11:29:33Z

How to run AWQ-W4Afp8 quantization on MoE models?

I have run awq-w4afp8 quantization on Qwen1.5-MoE-A2.7B, however, the ValueError occurred below

# llmcompressor and compressed-tensors are both installed from source code of main barch.

from compressed_tensors.quantization import (
    QuantizationArgs,
    QuantizationScheme,
    QuantizationStrategy,
    QuantizationType,
)

from llmcompressor.modifiers.awq import AWQModifier
from llmcompressor.modifiers.quantization import QuantizationModifier

model_path="Qwen/Qwen1.5-MoE-A2.7B"

tokenizer = AutoTokenizer.from_pretrained(
model_path, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
model_path, device_map="auto", torch_dtype="auto",
trust_remote_code=True)

ignore_layers = ["lm_head", "re:.*mlp.gate", "re:.*mlp.shared_expert_gate"]

recipe = [
    AWQModifier(bits=4, symmetric=False),
    QuantizationModifier(
        ignore=ignore_layers,
        config_groups={
            "group_0": QuantizationScheme(
                targets=["Linear"],
                weights=QuantizationArgs(
                    num_bits=4,
                    type=QuantizationType.INT,
                    dynamic=False,
                    symmetric=False,
                    strategy=QuantizationStrategy.GROUP,
                    group_size=128,
                ),
                input_activations=QuantizationArgs(
                    num_bits=8,
                    type=QuantizationType.FLOAT,
                    strategy=QuantizationStrategy.TENSOR,
                    dynamic=False,
                    symmetric=True,
                ),
            ),
        },
    ),
]


oneshot(
    model=model,
    tokenizer=tokenizer,
    dataset="open_playpus",
    recipe=self.recipe,
    max_seq_length=2948,
    num_calibration_samples=2,
    save_compressed=True,
    trust_remote_code_model=True,
    output_dir=self.saved_path,
)

tokenizer.save_pretrained

brian-dellabetta · 2025-04-22T14:12:39Z

Hi @wanzhenchn , thanks for taking an interest in our AWQ feature! We have merged most of the AWQ logic but we have a few TODOs related to the issues you are hitting (here and here). We wanted to add these in a separate PR so that the initial PR is largely a port of the code in AutoAWQ, and so we have an example for how additional mappings can be added for other architectures.

We will wrap this up in the next couple weeks and make a release and more public announcement that AWQ is ready for consumption.

wanzhenchn · 2025-04-23T02:33:06Z

Hi @wanzhenchn , thanks for taking an interest in our AWQ feature! We have merged most of the AWQ logic but we have a few TODOs related to the issues you are hitting (here and here). We wanted to add these in a separate PR so that the initial PR is largely a port of the code in AutoAWQ, and so we have an example for how additional mappings can be added for other architectures.

We will wrap this up in the next couple weeks and make a release and more public announcement that AWQ is ready for consumption.

Thanks for your feedback, looking forward to AWQ supporting more models.

brian-dellabetta self-assigned this Apr 22, 2025

wanzhenchn closed this as completed Apr 23, 2025

wanzhenchn reopened this Apr 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run AWQ-W4Afp8 quantization? #1368

How to run AWQ-W4Afp8 quantization? #1368

wanzhenchn commented Apr 22, 2025 •

edited

Loading

brian-dellabetta commented Apr 22, 2025

wanzhenchn commented Apr 23, 2025

How to run AWQ-W4Afp8 quantization? #1368

How to run AWQ-W4Afp8 quantization? #1368

Comments

wanzhenchn commented Apr 22, 2025 • edited Loading

brian-dellabetta commented Apr 22, 2025

wanzhenchn commented Apr 23, 2025

wanzhenchn commented Apr 22, 2025 •

edited

Loading