Llama.cpp a lot slower when using cmake compared to using w64devkit #594
Unanswered
v4lentin1879
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Did you make sure you are building in Release mode? (CMAKE_BUILD_TYPE=Release) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm running mistral-orca on llama.cpp on my mac. The application I am using it for runs on macos and windows. It's a typescript application using the node-llama-cpp library. Due to that library I need to build the binaries using cmake.
Now my issue is that the llama.cpp binaries are a lot slower when using cmake compared to using w64devkit on windows. I'm not even at the step where I wrap the node-llama-cpp library around it.
Does anyone know the reason why cmake makes it this slow and w64devkit doesn't? Is there a possibility to fix this using a flag or so? I am running it on a 2019 macbook pro with an i7, 6 Core. I am using CPU inference only.
w64devkit:
llama_print_timings: load time = 2789.31 ms
llama_print_timings: sample time = 7.55 ms / 18 runs ( 0.42 ms per token, 2383.16 tokens per second)
llama_print_timings: prompt eval time = 1925.06 ms / 20 tokens ( 96.25 ms per token, 10.39 tokens per second)
llama_print_timings: eval time = 8256.93 ms / 18 runs ( 458.72 ms per token, 2.18 tokens per second)
llama_print_timings: total time = 23842.73 ms
cmake:
llama_print_timings: load time = 4133.27 ms
llama_print_timings: sample time = 5.71 ms / 18 runs ( 0.32 ms per token, 3153.47 tokens per second)
llama_print_timings: prompt eval time = 25917.85 ms / 19 tokens ( 1364.10 ms per token, 0.73 tokens per second)
llama_print_timings: eval time = 43493.77 ms / 18 runs ( 2416.32 ms per token, 0.41 tokens per second)
llama_print_timings: total time = 74989.23 ms
Thanks a lot for your help!
Beta Was this translation helpful? Give feedback.
All reactions