Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: "QNN Engine is offline." when using a Snapdragon X Plus laptop #2962

Open
barealek opened this issue Jan 10, 2025 · 16 comments
Open

[BUG]: "QNN Engine is offline." when using a Snapdragon X Plus laptop #2962

barealek opened this issue Jan 10, 2025 · 16 comments
Assignees
Labels
core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@barealek
Copy link

barealek commented Jan 10, 2025

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

When trying to inference on any QNN model on a Snapdragon X Plus laptop, the issue below occurs.
image

The logs specifies that the required CPU/NPU is not found:

{
  "level": "info",
  "message": "\u001b[36m[QnnNativeEmbedder]\u001b[0m QNN API server is not supported on this platform - no valid CPU/NPU found. {\"validCores\":[\"Snapdragon(R) X Elite\"],\"cores\":[\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\"]}",
  "service": "backend"
}

image

Starting AnythingLLM and reproducing the error, the full log looks like this:

{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY ENABLED]\u001b[0m Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.","service":"backend"}
{"level":"info","message":"prisma:info Starting a sqlite pool with 21 connections.","service":"backend"}
{"level":"info","message":"prisma:info Started query engine http server on http://127.0.0.1:51049","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"server_boot\",\"distinctId\":\"ea8cb903-cdc7-4dbc-898a-f0c70402eefb\",\"properties\":{\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"Skipping preloading of AnythingLLMOllama - LLM_PROVIDER is qnnengine.","service":"backend"}
{"level":"info","message":"Hot loading of QnnEngine - LLM_PROVIDER is qnnengine with model llama_v3_2_3b_chat_8k.","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 4096","service":"backend"}
{"level":"info","message":"\u001b[36m[CommunicationKey]\u001b[0m RSA key pair generated for signed payloads within AnythingLLM services.","service":"backend"}
{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"[production] AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: false","service":"backend"}
{"level":"info","message":"\u001b[36m[BackgroundWorkerService]\u001b[0m Feature is not enabled and will not be started.","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 4096","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n    at s.checkReady (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:1604)\n    at async s.streamGetChatCompletion (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:2741)\n    at async SL (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:2892)\n    at async C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:4507","service":"backend"}

Are there known steps to reproduce?

No response

@barealek barealek added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 10, 2025
@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

Same issue on snapdragon x elite for me

@timothycarambat
Copy link
Member

Same issue on snapdragon x elite for me

Is this after downloading a model? Also have you tried a reboot post-download of the model?

Second, @barealek - I just got confirmation that we can run Elite compiled models on Plus chipsets, so we will patch that and re-release 1.7.2

@lachlanharrisdev
Copy link

Is this after downloading a model? Also have you tried a reboot post-download of the model?

I downloaded and tried to run a model, choosing the qualcomm LLM provider and NPU embedder, but it came up with the error. It failed to work after fully rebooting the app and restarting my computer, then tried all of the same things after uninstalling and reinstalling the app, which still didn't work.

I haven't done any additional setup of the NPU or anything outside of AnythingLLM, so I'm wondering if there is some driver(s) I'm missing? I'll let the experts figure it out.

@timothycarambat
Copy link
Member

timothycarambat commented Jan 11, 2025

@lachlanharrisdev - we just pushed a new build for arm64 1.7.2-r2-arm64 (version is located in top right of app window). If you don't have that version installed, download the new build and you should be okay now.

Also what device + chipset are you on? Plus, Elite, etc

@timothycarambat timothycarambat self-assigned this Jan 11, 2025
@timothycarambat timothycarambat added core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue labels Jan 11, 2025
@lachlanharrisdev
Copy link

@timothycarambat I've just installed the new build and it's still failing but it's behaving differently. After I upgraded to the new version and sent a chat, it came up with the error QNN Engine is booting. Please wait for it to finish and try again. I gave it a couple seconds, and sent a chat again, and after ~14 seconds of loading, it came up with the error from before, QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.

What I noticed is that when booting up AnythingLLM, right before the loading screen switches to the home UI, I can see a task pop up for a split second in task manager called "AnythingLLMQnnEngine", but it seems to end itself very quickly. Same task also pops up after I send a chat, after the QNN Engine "boots", but then again it quickly closes itself.

I'm currently on a Surface Laptop 7 15", running the X elite X1E-80-100.

@timothycarambat
Copy link
Member

timothycarambat commented Jan 11, 2025

@lachlanharrisdev I wrote this up to debug the engine directly (app should be closed)
https://docs.google.com/document/d/1Uk9WKCXz0a6tuKeWbaoSD1gDUGglBVycNgJBsDZJB2k/edit?usp=sharing

I have the same chipset on a Dell Latitude,

@lachlanharrisdev
Copy link

@timothycarambat yep, that found the issue

[WARN]  "Unable to initialize logging in the backend."
[ERROR] "Could not initialize backend due to error = 4000"
[ERROR] "Qnn initializeBackend FAILED!"
Failure to initialize model
Error: Failed to create the Genie Dialog.

If it's relevant, this was using llama 3.1 8b, not 3.2 3b.

@timothycarambat
Copy link
Member

@lachlanharrisdev Now this is a very different issue from other then. If you run the command as administrator does it still fail to initialize? I am wondering how/why you would require admin to execute the LLM engine, but someone else had success with that and I have to determine why that would ever be the case for anyone since that should not be required to start the QNN LLM API.

@timothycarambat
Copy link
Member

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

If you run the command as administrator does it still fail to initialize?

@timothycarambat nope, running it as admin now works and I do see QNN running on localhost.

[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 609845760 across 10 buffers"
AnythingLLMQnnEngine API Server: Starting chat API on host 127.0.0.1:8000
Build: 1.0.1 80b0117 Fri Dec 27 13:07:34 2024

I tried running AnythingLLM as administrator and, after QNN engine boots, I can successfully chat. This works for me, but I'm more than happy to keep testing things out for you, I'd love to contribute in any way I can. Should we create a new issue and continue there?

@AlphaEcho11
Copy link

@lachlanharrisdev - we just pushed a new build for arm64 1.7.2-r2-arm64 (version is located in top right of app window). If you don't have that version installed, download the new build and you should be okay now.

Build 1.7.2-r2-arm64 seems to be working well while running in Administrator mode.
QNN engine still appears to fail, especially in build 1.7.2-arm64, with an additional Boot failure of port 8080 (on some devices;cannot confirm all SoC in play).

Happy hunting, everyone!

@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

@timothycarambat I've just restarted my PC and now it seems to no longer work even with administrator mode... I'm guessing the same QNN Engine instance stayed online from the instructinos in the google doc, and AnythingLLM used that instance instead of booting another one (if that's even possible, I know barely anything about AI and Qualcomm). Hopefull that clears up any confusion.

@AlphaEcho11 interesting, what device are you using? Just wondering if this is only a surface laptop thing

@AlphaEcho11
Copy link

@lachlanharrisdev - Surface Pro 11 here, on the X Elite. After several device reboots and AnythingLLM refreshes, it's been working without issue.
What's the output of the backend logs when you have the device rebooted and attempting to get QNN engine up? Curious if it's failing or something further. Thanks in advance!

@barealek
Copy link
Author

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs:
backend-2025-01-11.log

@AlphaEcho11
Copy link

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs:
backend-2025-01-11.log

Thank you for the logs! Yes, seeing the QNN engine fail to get online here; going to check one more area and see if another variable is at play.

@AlphaEcho11
Copy link

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs: backend-2025-01-11.log

Can you reattempt this with the 8B model as well? Following @timothycarambat 's previous recommendations and tweaking:

  1. Download & unpack the model
  2. Restart Anything LLM (in administrator mode)
  3. Load up workspace running the Qualcomm QNN engine with the model requested
  4. Watch for NPU performance, check backend log for QNN engine data

Let us know the results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

4 participants