-
Notifications
You must be signed in to change notification settings - Fork 131
Description
Hi! 👋
Firstly, thanks for your work on this project! 🙂
Today I used patch-package to patch @livekit/[email protected]
for the project I'm working on.
I followed the instructions on the LiveKit website for setting up a Node.js voice pipeline. I tried adding a function context to this setup which I have done in the past with the multimodal. My expectation was that I would ask the Agent a question that is tied to a function and it would fire exactly how it does in the multimodal. However, in the voice pipeline on first ask it gets stuck in the playSpeech workflow. When a tool call is made tts gets stuck since an empty text reply is generated and it basically never resolves. I have to ask the question a second time which triggers an interrupt workflow which finally gets the function to fire. I had to add the following script to simulate this same action. Not ideal but a temporary patch while you guys look into this further. Setup should be nearly the same as on the website examples and I am using OpenAI LLM with model 4.1-mini.
Here is the diff that solved my problem:
diff --git a/node_modules/@livekit/agents/dist/pipeline/agent_output.js b/node_modules/@livekit/agents/dist/pipeline/agent_output.js
index 2de3437..7957011 100644
--- a/node_modules/@livekit/agents/dist/pipeline/agent_output.js
+++ b/node_modules/@livekit/agents/dist/pipeline/agent_output.js
@@ -159,6 +159,10 @@ const streamSynthesisTask = (stream, handle) => {
if (!fullText || fullText.trim().length === 0) {
cancelled = true;
handle.queue.put(SynthesisHandle.FLUSH_SENTINEL);
+
+ setTimeout(() => {
+ handle?.playHandle?.interrupt();
+ }, 500);
}
ttsStream.flush();
ttsStream.endInput();