[AWQ] Insane memory requirement: over 900GB for 32B model #1409

mratsim · 2025-05-04T18:50:34Z

I tried to quantized GLM-4-0414-32B: https://huggingface.co/THUDM/GLM-4-32B-0414

Recipe:

recipe = [
    AWQModifier(
        bits=4,
        symmetric=False,
        # Read input->output from https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/models/glm4/modeling_glm4.py
        # which are somewhat easier than vllm ones as it's all in a single file
        mappings=[
            AWQMapping("re:.*input_layernorm", ["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"]),
            AWQMapping("re:.*v_proj", ["re:.*o_proj"]),
            AWQMapping("re:.*post_attention_layernorm", ["re:.*gate_up_proj"]),
            AWQMapping("re:.*gate_up_proj", ["re:.*down_proj"]),
        ]
    ),
    QuantizationModifier(
        ignore=ignore_layers,
        config_groups={
            "group_0": QuantizationScheme(
                targets=["Linear"],
                weights=QuantizationArgs(
                    num_bits=4,
                    type=QuantizationType.INT,
                    dynamic=False,
                    symmetric=False,
                    strategy=QuantizationStrategy.GROUP,
                    group_size=128,
                ),
            ),
        },
    )
]

I tried using 128 samples as suggested in those slides ("Calibration set"): https://minjiazhang.github.io/courses/fall24-resource/slides/awq.pdf

However every sample memory usage grew by 1~5 GB leading in the end to over 900GB before I decided to give up on AWQ. Even with a swapfile, the time was spent in kernel swap in/out and IO and compute was slow to then frustratingly crash on Cuda OOM once that CPU part was solved.

Screenshot:

Side-note: couldn't the calibration be made multi-threaded?

mratsim added the bug Something isn't working label May 4, 2025

mratsim mentioned this issue May 4, 2025

[Feature] Log/info/Save/Restore quantization steps #1410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWQ] Insane memory requirement: over 900GB for 32B model #1409

[AWQ] Insane memory requirement: over 900GB for 32B model #1409

mratsim commented May 4, 2025 •

edited

Loading

[AWQ] Insane memory requirement: over 900GB for 32B model #1409

[AWQ] Insane memory requirement: over 900GB for 32B model #1409

Comments

mratsim commented May 4, 2025 • edited Loading

mratsim commented May 4, 2025 •

edited

Loading