Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Open
Francis235 opened this issue Mar 18, 2025 · 4 comments
Open

Qwen2.5-0.5b SpinQuant export .pte fail #9353

Francis235 opened this issue Mar 18, 2025 · 4 comments
Labels
need-user-input The issue needs more information from the reporter before moving forward

Comments

@Francis235
Copy link

🚀 The feature, motivation and pitch

Hi, I use the following bash script to export spinquant Qwen2.5-0.5b model, but I encounter some issues, I would like to know if someone has the same problems or can assist me in solving the issue.

export QWEN_QUANTIZED_CHECKPOINT=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/consolidated.00.pth
export QWEN_PARAMS=${WORKSPACE}/Qwen2.5-0.5B-Instruct-SpinQuant/params.json

python -m examples.models.llama.export_llama \
   --model "qwen2_5" \
   --checkpoint "${QWEN_QUANTIZED_CHECKPOINT:?}" \
   --params "${QWEN_PARAMS:?}" \
   --use_sdpa_with_kv_cache \
   -X \
   --xnnpack-extended-ops \
   --preq_mode 8da4w_output_8da8w \
   --preq_group_size 64 \
   --max_seq_length 2048 \
   --max_context_length 2048 \
   --output_name "qwen2_5_0_5b_quant_from_source.pte" \
   -kv \
   -d fp32 \
   --preq_embedding_quantize 8,0 \
   --use_spin_quant native \
   --metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}'
[INFO 2025-03-18 11:44:05,137 builder.py:211] Exporting with:
[INFO 2025-03-18 11:44:05,138 builder.py:212] inputs: (tensor([[2, 3, 4]]), {'input_pos': tensor([0])})
[INFO 2025-03-18 11:44:05,138 builder.py:213] kwargs: None
[INFO 2025-03-18 11:44:05,138 builder.py:214] dynamic shapes: ({1: <class 'executorch.extension.llm.export.builder.token_dim'>}, {'input_pos': {0: 1}})
[INFO 2025-03-18 11:44:29,659 builder.py:235] Running canonical pass: RemoveRedundantTransposes
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:701] Lowering model using following partitioner(s): 
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackDynamicallyQuantizedPartitioner
[INFO 2025-03-18 11:44:29,853 export_llama_lib.py:703] --> XnnpackPartitioner
[INFO 2025-03-18 11:44:29,853 builder.py:321] Using pt2e [] to quantizing the model...
[INFO 2025-03-18 11:44:29,853 builder.py:372] No quantizer provided, passing...
Traceback (most recent call last):
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 34, in <module>
    main()  # pragma: no cover
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama.py", line 30, in main
    export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 543, in export_llama
    builder = _export_llama(args)
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 879, in _export_llama
    builder = _to_edge_and_lower_llama_xnnpack(
  File "/home/bobo.zhou/workspace/executorch/examples/models/llama/export_llama_lib.py", line 711, in _to_edge_and_lower_llama_xnnpack
    builder = builder_exported.pt2e_quantize(quantizers).to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/extension/llm/export/builder.py", line 445, in to_edge_transform_and_lower
    self.edge_manager = to_edge_transform_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1171, in to_edge_transform_and_lower
    edge_manager = edge_manager.to_backend({name: curr_partitioner})
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 106, in wrapper
    return func(self, *args, **kwargs)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1429, in to_backend
    new_edge_programs[name] = to_backend(program, partitioner[name])
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 397, in _
    tagged_graph_module = _partition_and_lower(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 320, in _partition_and_lower
    partitioned_module = _partition_and_lower_one_graph_module(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 250, in _partition_and_lower_one_graph_module
    lowered_submodule = to_backend(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 114, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/xnnpack_preprocess.py", line 171, in preprocess
    node_visitors[node.target.__name__].define_node(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/op_linear.py", line 57, in define_node
    self.define_tensor(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 398, in define_tensor
    buffer_idx = self.get_serialized_buffer_index(
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 574, in get_serialized_buffer_index
    const_val = self.convert_to_qc4w(const_val)
  File "/home/bobo.zhou/.conda/envs/executorch/lib/python3.10/site-packages/executorch/backends/xnnpack/operators/node_visitor.py", line 470, in convert_to_qc4w
    assert (
AssertionError: convert_to_qc4w: [min,max] out of [-8, 7] range, got [-128, 127]

By the way, I refer QwenSpinQuant to get the Qwen2.5 0.5b spinquant model.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

@kimishpatel kimishpatel added the need-user-input The issue needs more information from the reporter before moving forward label Mar 18, 2025
@kimishpatel
Copy link
Contributor

Have you tried the same command with llama3.2 1b/3b? And that works?

Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

@tombang
Copy link

tombang commented Mar 19, 2025

Have you tried the same command with llama3.2 1b/3b? And that works?

Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

Hi, Can the llama 3.2 model run on the DSP/HTP by using QNN?

@kimishpatel
Copy link
Contributor

Have you tried the same command with llama3.2 1b/3b? And that works?
Can you also add --verbose to the export command and paste the output, via pastbin or github gist?

Hi, Can the llama 3.2 model run on the DSP/HTP by using QNN?

Yes it can. cc @cccclai

@cccclai
Copy link
Contributor

cccclai commented Mar 24, 2025

It can but the current qnn flow in open source doesn't have accurate output. Can we have a QAT weight released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-user-input The issue needs more information from the reporter before moving forward
Projects
None yet
Development

No branches or pull requests

4 participants