Using a KV cache with GGML #793

benraha · 2024-04-10T12:23:17Z

benraha
Apr 10, 2024

I'm using GGML in a batch LLM inference use case. I'd love to know if there is any way to utilize a KV cache, to avoid recalculating some of the calculations. I couldn't find any mention of that in the code.

Even more, my wish is to use the same KV cache between prompts, since a lot of the use cases share the same prompt pattern.

Thanks!
Ben

phymbert · 2024-04-10T14:07:14Z

phymbert
Apr 10, 2024

Look at llama.cpp:

https://github.com/ggerganov/llama.cpp/blob/65c64dc36f9bca5b3f100614cdd02bf12d6b3e49/llama.h#L510

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using a KV cache with GGML #793

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using a KV cache with GGML #793

Uh oh!

benraha Apr 10, 2024

Replies: 1 comment

Uh oh!

phymbert Apr 10, 2024

benraha
Apr 10, 2024

phymbert
Apr 10, 2024