Replies: 1 comment
-
Look at llama.cpp: https://github.com/ggerganov/llama.cpp/blob/65c64dc36f9bca5b3f100614cdd02bf12d6b3e49/llama.h#L510 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using GGML in a batch LLM inference use case. I'd love to know if there is any way to utilize a KV cache, to avoid recalculating some of the calculations. I couldn't find any mention of that in the code.
Even more, my wish is to use the same KV cache between prompts, since a lot of the use cases share the same prompt pattern.
Thanks!
Ben
Beta Was this translation helpful? Give feedback.
All reactions