You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having tools defined in your kani's, it usually means slower response times compared with the same kani without the tools. This is partly cause by the increase payload of the request, but mostly because the reply can not be streamed because the answer may need to be formatted like in json. When a reply from the llm server can not be streamed, the perceived lag increases based on the llm response length.
There a few way around this issue, depending on your needs. However there is an improvement kani can do internally to avoid non streaming responses.
A full round involving tools, has two passes against the llm. 1) user query is sent to the llm, this query contains the defined tools, thus its response may be formatted and streaming is not allowed.
LLM responds with the list of tools that need to be called in order to respond to the query.
Kani now calls the functions, and adds to the chat history, the results of the tool calls.
After that it calls the llm get completion again with the updated chat_history, to recive the llm processing of the tool response. At this moment, the first was completed and the llm detected the needed tools, they are not needed for this completion run. Removing the tools, allows for the response to be streamed.
Another approach, which better suited me, is to break tool runs across kani's. I have one kani that has all the defined tools, this kani "the router" is called before all other kani's in order to detect which kni should handle the current user query. Aside from routing, it also detects tool calls. When tools are detected it gathers the info and inserts the data into the kani that will generate the response just like RAG. This way the responses to the user are always streamed
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Having tools defined in your kani's, it usually means slower response times compared with the same kani without the tools. This is partly cause by the increase payload of the request, but mostly because the reply can not be streamed because the answer may need to be formatted like in json. When a reply from the llm server can not be streamed, the perceived lag increases based on the llm response length.
There a few way around this issue, depending on your needs. However there is an improvement kani can do internally to avoid non streaming responses.
A full round involving tools, has two passes against the llm. 1) user query is sent to the llm, this query contains the defined tools, thus its response may be formatted and streaming is not allowed.
LLM responds with the list of tools that need to be called in order to respond to the query.
Kani now calls the functions, and adds to the chat history, the results of the tool calls.
After that it calls the llm get completion again with the updated chat_history, to recive the llm processing of the tool response. At this moment, the first was completed and the llm detected the needed tools, they are not needed for this completion run. Removing the tools, allows for the response to be streamed.
Another approach, which better suited me, is to break tool runs across kani's. I have one kani that has all the defined tools, this kani "the router" is called before all other kani's in order to detect which kni should handle the current user query. Aside from routing, it also detects tool calls. When tools are detected it gathers the info and inserts the data into the kani that will generate the response just like RAG. This way the responses to the user are always streamed
Beta Was this translation helpful? Give feedback.
All reactions