Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK？ #9311

tombang · 2025-03-17T13:28:39Z

🐛 Describe the bug

I have run the XNNPACK tutorial and the pte file can run normally. but when I run the tutorial of Llama 3 8B and change the model to Llama 3.2 3B, the model load fail on the Android device of Qualcomm 8Gen 2， the fail code is 1.

QNN:2.26
SDK:r27b

Versions

QNN:2.26
SDK:r27b

cc @cccclai @winskuo-quic @shewu-quic @cbilgin

kimishpatel · 2025-03-17T14:31:04Z

Few follow ups

Issue summary does not say if you are runnign using qnn workflow. can you iupdate with details?
Please add your repro

tombang · 2025-03-18T01:54:19Z

I followed this tutorial （https://github.com/pytorch/executorch/blob/release/0.5/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md） exactly, except that I replaced the model from Llama3 8B to Llama 3.2 3B during execution. The export command I used is as follows:

python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_8a8w -d fp32 --num_sharding 4 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte" --soc_model SM8550

This command successfully exports the model, but when running the APK on Android, a model loading error occurs with error code -1.

kimishpatel · 2025-03-19T15:22:34Z

Ok lets follow up on this.

cccclai · 2025-03-19T17:47:03Z

Can you try qnn version 2.28?

tombang · 2025-03-21T13:20:44Z

Hi,

I change the QNN to 2.28, there is an error occured:

[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[WARNING] [Qnn ExecuTorch]: Unknown QNN BinaryInfo version 3.
[ERROR] [Qnn ExecuTorch]: Failed to retrieve backend binary info from QNN context binary.
[ERROR] [Qnn ExecuTorch]: Failed to parse QNN Graph Info. The cache might be broken. Please consider to re-generate the cache.
[WARNING] [Qnn ExecuTorch]: QnnDsp Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp Arch 68 set by custom config is different from arch associated with SoC 43, will overwrite it to 73

[ERROR] [Qnn ExecuTorch]: QNN context cache is invalid.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/data/tmp/executorch/examples/models/llama/export_llama.py", line 32, in
main() # pragma: no cover
File "/mnt/data/tmp/executorch/examples/models/llama/export_llama.py", line 28, in main
export_llama(args)
File "/mnt/data/tmp/executorch/examples/models/llama/export_llama_lib.py", line 522, in export_llama
builder = _export_llama(args)
File "/mnt/data/tmp/executorch/examples/models/llama/export_llama_lib.py", line 801, in _export_llama
canonicalize_program(builder.edge_manager.exported_program())
File "/mnt/data/tmp/executorch/backends/qualcomm/utils/utils.py", line 281, in canonicalize_program
update_spill_fill_size(obj)
File "/mnt/data/tmp/executorch/backends/qualcomm/utils/utils.py", line 278, in update_spill_fill_size
update_program(*get_program_info(exported_program))
File "/mnt/data/tmp/executorch/backends/qualcomm/utils/utils.py", line 256, in get_program_info
return dispatchtype(program)
File "/mnt/data/tmp/executorch/backends/qualcomm/utils/utils.py", line 233, in process_exported_program
assert qnn_mgr.Init().value == 0, "failed to load context binary"
AssertionError: failed to load context binary
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend

The command of export is below:

python -m examples.models.llama.export_llama
-c "/home/ai/tb/LLM-Research/Llama-3___2-3B/original/consolidated.00.pth"
-p "/home/ai/tb/LLM-Research/Llama-3___2-3B/original/params.json"
-t /home/ai/tb/LLM-Research/Llama-3___2-3B/original/tokenizer.model
-kv --disable_dynamic_shape --qnn
--num_sharding 8
--pt2e_quantize qnn_16a4w
-d fp32
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
--output_name="test.pte"
--soc_model SM8550
--calibration_tasks wikitext
--calibration_limit 1
--calibration_seq_length 128
--calibration_data "<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an helpful assistant. <|eot_id|><|start_header_id|>user<|end_header_id|> Name all the prime ministers of India. <|eot_id|><|start_header_id|>assistant<|end_header_id|>"

james-p-xu · 2025-03-25T23:16:45Z

I am hitting the same issue as @tombang, with QNN version 2.29 and on release/0.5 branch.

Unknown QNN BinaryInfo version 3.
Failed to retrieve backend binary info from QNN context binary.
Failed to parse QNN Graph Info. The cache might be broken. Please consider to re-generate the cache.
QNN context cache is invalid.
Fail to configure Qnn context
Fail to initialize Qnn Manager
Init failed for backend QnnBackend: 0x1

java.lang.Exception: Execution of method forward failed with status 0x1

I saw someone mention on a different thread that they got the Llama example working with 2.31 so unsure what is happening? Would appreciate any pointers here.

kimishpatel · 2025-03-26T03:02:25Z

@cccclai can you tag someone from qcomm?

cccclai · 2025-03-26T04:28:46Z

I am hitting the same issue as @tombang, with QNN version 2.29 and on release/0.5 branch.

Unknown QNN BinaryInfo version 3.
Failed to retrieve backend binary info from QNN context binary.
Failed to parse QNN Graph Info. The cache might be broken. Please consider to re-generate the cache.
QNN context cache is invalid.
Fail to configure Qnn context
Fail to initialize Qnn Manager
Init failed for backend QnnBackend: 0x1

java.lang.Exception: Execution of method forward failed with status 0x1
I saw someone mention on a different thread that they got the Llama example working with 2.31 so unsure what is happening? Would appreciate any pointers here.

It seems like a different issue, as the reported issue happens during export (ahead of time), and the issue from your side is runtime error (I assume you generate the .pte file already?)

cccclai · 2025-03-26T04:33:30Z

@tombang Thank you for trying out the qnn + executorch solution. I've never tried 3b model with the executorch + qnn, but we have stories model set up in CI and it has been running fine

executorch/.github/workflows/pull.yml

Line 587 in 0342bab

PYTHON_EXECUTABLE=python bash .ci/scripts/test_qnn_static_llama.sh

. The CI is using qnn v2.28. ndk is likely r27b based on the the docker setup for the CI

executorch/.ci/docker/build.sh

Line 51 in 0342bab

ANDROID_NDK_VERSION=r27b

While we're trying to improve the out-of-the-box experience, do you mind trying out a simpler model to make sure the setup is correct?

kimishpatel added the need-user-input The issue needs more information from the reporter before moving forward label Mar 17, 2025

kimishpatel added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Mar 19, 2025

cccclai added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK？ #9311

Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK？ #9311

tombang commented Mar 17, 2025 •

edited by pytorch-bot bot

Loading

kimishpatel commented Mar 17, 2025

tombang commented Mar 18, 2025

kimishpatel commented Mar 19, 2025

cccclai commented Mar 19, 2025

tombang commented Mar 21, 2025

james-p-xu commented Mar 25, 2025

kimishpatel commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK？ #9311

Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK？ #9311

Comments

tombang commented Mar 17, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

kimishpatel commented Mar 17, 2025

tombang commented Mar 18, 2025

kimishpatel commented Mar 19, 2025

cccclai commented Mar 19, 2025

tombang commented Mar 21, 2025

james-p-xu commented Mar 25, 2025

kimishpatel commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

tombang commented Mar 17, 2025 •

edited by pytorch-bot bot

Loading