bugfix AWQ with Llama models and python 3.9 (#1384)

brian-dellabetta · kylesayrs · commit addef4e3e695 · 2025-04-29T10:57:23.000-04:00
SUMMARY: LlamaAttention.forward has an optional `attention_mask` field that has no default (see [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L246)). So `attention_mask=None` must be passed in, otherwise AWQ will error out. The previous check only worked for Python 3.10 and 3.11. This fixes it to be a more general recommended solution that works with Python 3.9 ```python from transformers.models.llama.modeling_llama import LlamaAttention import inspect import typing params = inspect.signature(LlamaAttention.forward).parameters #old check old_check = (params["attention_mask"].annotation._name == "Optional") #new check new_check = (params["attention_mask"].default is inspect.Parameter.empty) print(f"OLD {old_check}, NEW {new_check}") # Python 3.9: OLD False, NEW True # Python 3.11: OLD True, NEW True ``` TEST PLAN: This will resolve the failing e2e test at https://github.com/neuralmagic/llm-compressor-testing/actions/runs/14654995202/job/41128588916#step:15:33208 --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
diff --git a/src/llmcompressor/modifiers/awq/base.py b/src/llmcompressor/modifiers/awq/base.py
@@ -617,12 +617,12 @@ def _sanitize_kwargs(inputs_kwargs, module):
     # In case forward pass has optional dependencies that don't default to None.
     # This is the case for `LlamaAttention.forward` which has input
     #  `attention_mask: Optional[torch.Tensor],` (with no `= None` default)
-    # https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L269
+    # https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L246
     for k, v in params.items():
         if (
             k not in sanitized_kwargs
             and k != "use_cache"
-            and getattr(v.annotation, "_name", "") == "Optional"
+            and v.default is inspect.Parameter.empty
         ):
             sanitized_kwargs[k] = None