Skip to content

Commit d15c0d0

Browse files
authoredMar 24, 2025··
Bump torchao pin to enable dynamic shapes in lowbit (#9555)
Bumps torchao pin to enable dynamic shapes in lowbit.
1 parent d3863a8 commit d15c0d0

File tree

4 files changed

+2
-17
lines changed

4 files changed

+2
-17
lines changed
 

‎.ci/scripts/test_llama_torchao_lowbit.sh

-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,6 @@ ${PYTHON_EXECUTABLE} -m examples.models.llama.export_llama \
7878
-qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
7979
--group_size ${QLINEAR_GROUP_SIZE} \
8080
-E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
81-
--disable_dynamic_shape \
8281
-d fp32
8382

8483
# Test run

‎examples/models/llama/README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
382382
383383
## Running with low-bit kernels
384384
385-
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results. Currently dynamic shapes must be disabled when exporting a model with these kernels.
385+
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
386386
387387
First export your model for lowbit quantization (step 2 above):
388388
@@ -408,7 +408,6 @@ python -m examples.models.llama.export_llama \
408408
-qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
409409
--group_size ${QLINEAR_GROUP_SIZE} \
410410
-E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
411-
--disable_dynamic_shape \
412411
-d fp32
413412
```
414413

‎examples/models/llama/export_llama_lib.py

-13
Original file line numberDiff line numberDiff line change
@@ -699,19 +699,6 @@ def _validate_args(args):
699699
"Shared embedding is only supported with torchao quantization."
700700
)
701701

702-
if (
703-
args.quantization_mode is not None
704-
and args.quantization_mode.startswith("torchao:")
705-
) or (
706-
args.embedding_quantize is not None
707-
and args.embedding_quantize.startswith("torchao:")
708-
):
709-
if args.enable_dynamic_shape:
710-
raise ValueError(
711-
"Dynamic shape is not currently supported with torchao ops. Please use --disable_dynamic_shape."
712-
"If you need this feature, please file an issue."
713-
)
714-
715702

716703
def _to_edge_and_lower_llama_xnnpack(
717704
builder_exported,

0 commit comments

Comments
 (0)
Please sign in to comment.