You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bugfix AWQ with Llama models and python 3.9 (#1384)
SUMMARY:
LlamaAttention.forward has an optional `attention_mask` field that has
no default (see
[here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L246)).
So `attention_mask=None` must be passed in, otherwise AWQ will error
out.
The previous check only worked for Python 3.10 and 3.11. This fixes it
to be a more general recommended solution that works with Python 3.9
```python
from transformers.models.llama.modeling_llama import LlamaAttention
import inspect
import typing
params = inspect.signature(LlamaAttention.forward).parameters
#old check
old_check = (params["attention_mask"].annotation._name == "Optional")
#new check
new_check = (params["attention_mask"].default is inspect.Parameter.empty)
print(f"OLD {old_check}, NEW {new_check}")
# Python 3.9: OLD False, NEW True
# Python 3.11: OLD True, NEW True
```
TEST PLAN:
This will resolve the failing e2e test at
https://github.com/neuralmagic/llm-compressor-testing/actions/runs/14654995202/job/41128588916#step:15:33208
---------
Signed-off-by: Brian Dellabetta <[email protected]>
0 commit comments