-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the QNN backend support the model of Llama 3.2 3B instead of XNNPACK? #9311
Comments
Few follow ups
|
I followed this tutorial (https://github.com/pytorch/executorch/blob/release/0.5/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md) exactly, except that I replaced the model from Llama3 8B to Llama 3.2 3B during execution. The export command I used is as follows: python -m examples.models.llama.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_8a8w -d fp32 --num_sharding 4 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="test.pte" --soc_model SM8550 This command successfully exports the model, but when running the APK on Android, a model loading error occurs with error code -1. |
Ok lets follow up on this. |
Can you try qnn version 2.28? |
Hi, I change the QNN to 2.28, there is an error occured: [INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters [WARNING] [Qnn ExecuTorch]: QnnDsp Arch 68 set by custom config is different from arch associated with SoC 43, will overwrite it to 73 [ERROR] [Qnn ExecuTorch]: QNN context cache is invalid. The command of export is below: python -m examples.models.llama.export_llama |
I am hitting the same issue as @tombang, with QNN version 2.29 and on release/0.5 branch. Unknown QNN BinaryInfo version 3.
Failed to retrieve backend binary info from QNN context binary.
Failed to parse QNN Graph Info. The cache might be broken. Please consider to re-generate the cache.
QNN context cache is invalid.
Fail to configure Qnn context
Fail to initialize Qnn Manager
Init failed for backend QnnBackend: 0x1
java.lang.Exception: Execution of method forward failed with status 0x1 I saw someone mention on a different thread that they got the Llama example working with 2.31 so unsure what is happening? Would appreciate any pointers here. |
@cccclai can you tag someone from qcomm? |
It seems like a different issue, as the reported issue happens during export (ahead of time), and the issue from your side is runtime error (I assume you generate the .pte file already?) |
@tombang Thank you for trying out the qnn + executorch solution. I've never tried 3b model with the executorch + qnn, but we have stories model set up in CI and it has been running fine executorch/.github/workflows/pull.yml Line 587 in 0342bab
executorch/.ci/docker/build.sh Line 51 in 0342bab
While we're trying to improve the out-of-the-box experience, do you mind trying out a simpler model to make sure the setup is correct? |
🐛 Describe the bug
I have run the XNNPACK tutorial and the pte file can run normally. but when I run the tutorial of Llama 3 8B and change the model to Llama 3.2 3B, the model load fail on the Android device of Qualcomm 8Gen 2, the fail code is 1.
QNN:2.26
SDK:r27b
Versions
QNN:2.26
SDK:r27b
cc @cccclai @winskuo-quic @shewu-quic @cbilgin
The text was updated successfully, but these errors were encountered: