AWQ Modifier Support #1177

brian-dellabetta · 2025-02-19T18:04:57Z

SUMMARY:
Addition of AWQModifier, based on AutoAWQ implementation.

Should be reviewed/merged in conjunction with neuralmagic/compressed-tensors#269

Replaces #181 and #824 (hence v3)

TEST PLAN:
Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of the paper. We achieve the following wikitext PPL scores:

Llama-2 7B Group 128:

Paper: 5.60
AutoAWQ: 5.615
This implementation: 5.612
we match what the paper reports for just RTN -- 5.73
We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper.

Llama-2 13B Group 128:

We match the results of AutoAWQ and the results shown in the paper: 4.97
We match what the paper reports for just RTN -- 4.984

NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation

github-actions · 2025-02-19T18:05:11Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

src/llmcompressor/modifiers/awq/base.py

brian-dellabetta · 2025-03-10T22:19:40Z

src/llmcompressor/modifiers/awq/base.py

+    # TODO this should only be added if v_proj/o_proj shapes match up
+    #  should we check during validation and skip if this is not the case?
+    AWQMapping("re:.*v_proj", ["re:.*o_proj"]),


This is the one TODO. The logic for this in AutoAWQ is to only add this mapping if the shapes line up correctly (logic here). This is the case for the llama 2 models i've been testing on, but not all of the tiny llama models. Any suggestion on how best to handle for both cases?

PPL is 5.607 for llama 2-7B with this included, 5.614 when it isn't.

dsikka

Should we add evals comparing to GPTQ?

brian-dellabetta · 2025-03-21T20:37:06Z

Using the latest commit at this time, I am getting the following results via lm-eval.

deepseek-ai/DeepSeek-R1-Distill-Llama-8B:
 dense:
   #gsm flexible-extract, strict-match
   gsm8k: .6619, .6490
   wititext ppl: 15.4498
 awq+quant sym:
   gsm8k: .6376, .6217
   wititext ppl: 18.8623
 quant sym:
   gsm8k: .6732, .6543
   wititext ppl: 16.7398
meta-llama/Llama-2-7b-hf:
 dense:
   gsm8k: .1342, .1342
   wititext ppl: 8.7587
 awq+quant sym:
   gsm8k: .1024, .1001
   wititext ppl: 9.194
 quant sym:
   gsm8k: .1183, .1152
   wititext ppl: 9.311

Signed-off-by: Brian Dellabetta <[email protected]>

…LENGTH are very low Signed-off-by: Brian Dellabetta <[email protected]>

Signed-off-by: Brian Dellabetta <[email protected]>

kylesayrs reviewed Feb 19, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

kylesayrs reviewed Feb 20, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 9273ef3 to 28f8bca Compare February 20, 2025 17:27

kylesayrs reviewed Feb 20, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Show resolved Hide resolved

kylesayrs reviewed Feb 25, 2025

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

brian-dellabetta changed the title ~~Bdellabe/awq modifier v3~~ Bdellabe/Rtuli awq modifier v3 Mar 10, 2025

brian-dellabetta marked this pull request as ready for review March 10, 2025 21:45

brian-dellabetta requested review from markurtz, kylesayrs, dsikka, rahul-tuli and horheynm March 10, 2025 21:45

brian-dellabetta mentioned this pull request Mar 10, 2025

Some fixes for AWQ neuralmagic/compressed-tensors#269

Open

brian-dellabetta commented Mar 10, 2025

View reviewed changes

dsikka requested changes Mar 11, 2025

View reviewed changes

brian-dellabetta mentioned this pull request Mar 17, 2025

Does it support importing the AWQ model and then exporting it in the compressed-tensor format? neuralmagic/compressed-tensors#153

Open

dsikka mentioned this pull request Mar 19, 2025

Does it support AWQ in the release or main branch? #1268

Closed

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch 2 times, most recently from 8120fe5 to d76ba6d Compare March 19, 2025 14:59

dsikka mentioned this pull request Mar 24, 2025

Any plan to support AWQ ? #1277

Closed

dsikka changed the title ~~Bdellabe/Rtuli awq modifier v3~~ AWQ Modifier Support Mar 25, 2025

dsikka mentioned this pull request Mar 25, 2025

Can i use qqq in llm-comprefessor? #1281

Closed

brian-dellabetta added 5 commits March 26, 2025 21:42

cherry picked files from stale PR #181 branch awq-feature-branch

4c82f15

Signed-off-by: Brian Dellabetta <[email protected]>

updated to be compatible with latest, unit tests passing

abc919d

Signed-off-by: Brian Dellabetta <[email protected]>

switch to using HooksMixin api

6ccc9bf

Signed-off-by: Brian Dellabetta <[email protected]>

pydantic serialization issue fix

c604246

Signed-off-by: Brian Dellabetta <[email protected]>

switch to accelerate with align_module_device

6e1b718

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta added 19 commits March 26, 2025 21:42

AWQ running but OOMs unless NUM_CALIBRATION_SAMPLES and MAX_SEQUENCE_…

4867808

…LENGTH are very low Signed-off-by: Brian Dellabetta <[email protected]>

working with larger num_calibration_samples

7588348

Signed-off-by: Brian Dellabetta <[email protected]>

fix pile dataset issue

6457c19

Signed-off-by: Brian Dellabetta <[email protected]>

updated config dataclasses

a73f55f

Signed-off-by: Brian Dellabetta <[email protected]>

OOM error resolved

a17d590

Signed-off-by: Brian Dellabetta <[email protected]>

codereview updates

7e8f22d

Signed-off-by: Brian Dellabetta <[email protected]>

minor touchups

95e698b

Signed-off-by: Brian Dellabetta <[email protected]>

updates from debugging

8289ed9

Signed-off-by: Brian Dellabetta <[email protected]>

styling

0cae174

Signed-off-by: Brian Dellabetta <[email protected]>

slightly improved rtn calculate_qparams logic

f60f830

Signed-off-by: Brian Dellabetta <[email protected]>

code cleanup

fc8e7e0

Signed-off-by: Brian Dellabetta <[email protected]>

rename smoothquant private vars

8e085a4

Signed-off-by: Brian Dellabetta <[email protected]>

squashed codereview updates for rebase

51c2351

Signed-off-by: Brian Dellabetta <[email protected]>

cleanup fixes from rebase

bd89cf2

Signed-off-by: Brian Dellabetta <[email protected]>

awq mappings registry

c81a1d0

Signed-off-by: Brian Dellabetta <[email protected]>

remove empty_cache calls

dfaa06a

Signed-off-by: Brian Dellabetta <[email protected]>

resolve attention module forward missing attention_mask input

e553a04

Signed-off-by: Brian Dellabetta <[email protected]>

improve order of check for optional kwargs setting to None

19a0987

Signed-off-by: Brian Dellabetta <[email protected]>

run awq one shot example

6ee0010

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/awq-modifier-v3 branch from 0df0b38 to 6ee0010 Compare March 26, 2025 21:42

clean up awq_one_shot example

712cc03

Signed-off-by: Brian Dellabetta <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWQ Modifier Support #1177

AWQ Modifier Support #1177

brian-dellabetta commented Feb 19, 2025 •

edited

Loading

github-actions bot commented Feb 19, 2025

brian-dellabetta Mar 10, 2025

brian-dellabetta Mar 11, 2025 •

edited

Loading

dsikka left a comment

brian-dellabetta commented Mar 21, 2025

AWQ Modifier Support #1177

Are you sure you want to change the base?

AWQ Modifier Support #1177

Conversation

brian-dellabetta commented Feb 19, 2025 • edited Loading

github-actions bot commented Feb 19, 2025

brian-dellabetta Mar 10, 2025

Choose a reason for hiding this comment

brian-dellabetta Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

brian-dellabetta commented Mar 21, 2025

brian-dellabetta commented Feb 19, 2025 •

edited

Loading

brian-dellabetta Mar 11, 2025 •

edited

Loading