Qwen2.5-0.5b SpinQuant export .pte fail #9353

Francis235 · 2025-03-18T06:40:01Z

🚀 The feature, motivation and pitch

Hi, I use the following bash script to export spinquant Qwen2.5-0.5b model, but I encounter some issues, I would like to know if someone has the same problems or can assist me in solving the issue.

export QWEN_QUANTIZED_CHECKPOINT=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/consolidated.00.pth
export QWEN_PARAMS=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/params.json

python -m examples.models.llama.export_llama \
   --model "qwen2_5" \
   --checkpoint "${QWEN_QUANTIZED_CHECKPOINT:?}" \
   --params "${QWEN_PARAMS:?}" \
   --use_sdpa_with_kv_cache \
   -X \
   --xnnpack-extended-ops \
   --preq_mode 8da4w_output_8da8w \
   --preq_group_size 64 \
   --max_seq_length 2048 \
   --max_context_length 2048 \
   --output_name "qwen2_5_0_5b_quant_from_source.pte" \
   -kv \
   -d fp32 \
   --preq_embedding_quantize 8,0 \
   --use_spin_quant native \
   --metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}'

[INFO 2025-03-18 11:44:05,137 builder.py:211] Exporting with:
[INFO 2025-03-18 11:44:05,138 builder.py:212] inputs: (tensor([[2, 3, 4]]), {'input_pos': tensor([0])})
[INFO 2025-03-18 11:44:05,138 builder.py:213] kwargs: None
[INFO 2025-03-18 11:44:05,138 builder.py:214] dynamic shapes: ({1: <class 'executorch.extension.llm.export.builder.token_dim'>}, {'input_pos': {0: 1}})
[INFO 2025-03-18 11:44:29,659 builder.py:235] Running canonical pass: RemoveRedundantTransposes
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:701] Lowering model using following partitioner(s): 
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackDynamicallyQuantizedPartitioner
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackPartitioner
[INFO 2025-03-18 11:44:29,853 builder.py:321] Using pt2e [] to quantizing the model...
[INFO 2025-03-18 11:44:29,853 builder.py:372] No quantizer provided, passing...
Traceback (most recent call last):
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 34, in <module>
    main()  # pragma: no cover
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 30, in main
    export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 543, in export_llama
    builder = _export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 879, in _export_llama
    builder = _to_edge_and_lower_llama_xnnpack(
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 711, in _to_edge_and_lower_llama_xnnpack
    builder = builder_exported.pt2e_quantize(quantizers).to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 445, in to_edge_transform_and_lower
    self.edge_manager = to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1171, in to_edge_transform_and_lower
    edge_manager = edge_manager.to_backend({name: curr_partitioner})
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1429, in to_backend
    new_edge_programs[name] = to_backend(program, partitioner[name])
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 397, in _
    tagged_graph_module = _partition_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 320, in _partition_and_lower
    partitioned_module = _partition_and_lower_one_graph_module(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 250, in _partition_and_lower_one_graph_module
    lowered_submodule = to_backend(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 114, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/xnnpack_preprocess.py", line 171, in preprocess
    node_visitors[node.target.__name__].define_node(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/op_linear.py", line 57, in define_node
    self.define_tensor(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 398, in define_tensor
    buffer_idx = self.get_serialized_buffer_index(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 574, in get_serialized_buffer_index
    const_val = self.convert_to_qc4w(const_val)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 470, in convert_to_qc4w
    assert (
AssertionError: convert_to_qc4w: [min,max] out of [-8, 7] range, got [-128, 127]

By the way, I refer QwenSpinQuant to get the Qwen2.5 0.5b spinquant model.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

The text was updated successfully, but these errors were encountered:

kimishpatel · 2025-03-18T14:39:38Z

Have you tried the same command with llama3.2 1b/3b? And that works?

Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

tombang · 2025-03-19T01:41:36Z

Have you tried the same command with llama3.2 1b/3b? And that works?

Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

Hi, Can the llama 3.2 model run on the DSP/HTP by using QNN?

kimishpatel · 2025-03-24T15:15:04Z

Have you tried the same command with llama3.2 1b/3b? And that works?
Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

Hi, Can the llama 3.2 model run on the DSP/HTP by using QNN?

Yes it can. cc @cccclai

cccclai · 2025-03-24T17:16:38Z

It can but the current qnn flow in open source doesn't have accurate output. Can we have a QAT weight released?

kimishpatel added the need-user-input The issue needs more information from the reporter before moving forward label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Francis235 commented Mar 18, 2025

kimishpatel commented Mar 18, 2025

tombang commented Mar 19, 2025

kimishpatel commented Mar 24, 2025

cccclai commented Mar 24, 2025

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Comments

Francis235 commented Mar 18, 2025

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

kimishpatel commented Mar 18, 2025

tombang commented Mar 19, 2025

kimishpatel commented Mar 24, 2025

cccclai commented Mar 24, 2025