Batched Decoding #230
martindevans
started this conversation in
General
Replies: 1 comment
-
If I understand this feature correctly, it's also possible to provide only one sequence per Then it'd be possible to create some batch executor from composition of existing executors that would form batches from all their sequence IDs instead. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
llama.cpp recently added an entirely new way to manage the KV cache. LLamaSharp has some bindings to this API (#185), but they're barely used - all of the executors are still using the old
llama_eval
method which is now obsolete.In #223 I added a new example of basic batched decoding which is a direct port of one of the llama.cpp examples. This uses the low level APIs directly, I'll be working to try and provide safe wrappers around everything it does.
In the future this may become the basis of an entirely new executor in LLamaSharp. For example that batched decoding example could become a new type of executor which provides multiple output streams.
Beta Was this translation helpful? Give feedback.
All reactions