-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_audio_in_queue
Getting Stucked and not getting the InputAudioRawFrame
coming from DailyInput
#721
Comments
Usually when this happens is becausse the If you are running a custom Pipecat (i.e. from the repo) would it be possible for you to re-raise the exception in the line mentioned in that bug report?
Does this happen with any specific function call or any of them? |
@aconchillo Usually what I have noticed that it happens when it uses |
@aconchillo We noticed sometimes it also occurs at the very start of the call as well. There is no specific pattern!!! |
@aconchillo No we are NOT using a fork pipecat |
@aconchillo Hard to reproduce. Any suggestion how to fix it? |
@aconchillo also as you pointed their is now exception being catch their as well so we never get to know!!! |
@aconchillo Finally good logs Flow:
|
So I'm also experiencing this - spent some time debugging and basically found two issues: First, in Second (and I think this is actually the cause of the issue), in I checked, and some of the processors definitely catch the CancelledError - one example is the |
I'm deeply investigating this. For demos WITHOUT llm function calls I can't re-create this. My working theory is some error related to function calling in openai.py isn't working as expected when there are a lot of interruptions causing things to get into a weird state |
I have reproduced this as well! Using Websockets though NOT the daily transport layer. I believe it has to do with the base input transport layer and the audio_task_handler being blocked as we see no logs from it when the issue is occurring and assuming the task is canceled or is just hanging. No function calls are being used in my repro |
@ajobi-uhc could you share your repro code? or logs? |
Cant share repo code very easily yet but these are the logs while the bug is active (ie bot is not answering but pipeline is running) See how we are adding frames via the push_audio_frame task without clearing the queue - I have logging in the audio_task_handler but nothing was being logged as the task (I believe) is somehow blocked
Here are the relevant functions where I have the logging taking place And when we push the input raw audio frame into the queue I have logging throughout the audio task handler even before it tries to get from the audio_in_queue and it never triggers as soon as the bug is active (https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/transports/base_input.py#L169) Replicating the bug is very hard - you have to interrupt the bot just as it's in the middle of completing a speech generation which is in line with what @GavinVS mentioned as a possible cause of the error. |
@ajobi-uhc @GavinVS I've tried randomly killing Cartesia when it's in the middle of processing https://github.com/pipecat-ai/pipecat/pull/750/files#diff-3d731f119bd0e348b74be27b419995a1a34ad733cb8dd12ab75f3a0fc52592c5R227-R229 Passing nothing vs pushing an ErrorFrame here doesn't make a difference. I can also still see the LLM generating text so I don't think killing Cartesia simulates the issue like I thought it would. I've also made a bot with no input transport that just continuously interrupts itself, and I'm not running into the issue https://github.com/pipecat-ai/pipecat/pull/750/files#diff-89c69c8c75fd520b80f636375afbc8fda2f6ecffcc3872087447213e1d7f49cc The main thing I'm trying to do now is consistently reproduce the interruption issue that causes the queue to keep filling up. It's not necessarily wrong for the queue to keep filling up if the bot's in an interrupt state, but it is wrong that it never gets out of that state. Looking at logs where the bot gets "stuck" there's always an equal number of StartInterruption and StopInterruption frames in the right order, so that doesn't seem to be an issue either. I'm still investigating this every day 👀 |
how are you replicating the issue on your end? @jamsea do you have some protocol to get it to trigger? I've literally just been saying "hello" at the right times when I expect it to be in the middle of a tts generation and it triggers the bug (usually) - hoping there's an easier way |
@ajobi-uhc I can only replicate this when using an LLM that's calling a function. Specifically, LLM function calls that make HTTP requests (hardcoded function call responses are all fine). Even then, it only happens less than 10% of the time for me. When I'm just talking to the WebSocket transport bot I can never get it stuck. I agree, though; the LLM function call route is probably just causing the error for me because it's adding a bit of indeterminism to the timing when trying to interrupt the bot. The main time sink has been just getting a consistent repro. Getting the bug to reproduce consistently will speed everything else up. |
@jamsea is there any update on this? its getting quite urgent for us as users routinely seem to get the bug in prod every 10 mins, any new findings that can help me debug on our side? |
I will start looking at this later today. I'd really like to know what's going on and have a fix for the next Pipecat release. bear with me. 🙏 |
Still getting this issue but while debugging another issue, I believe I found a possible cause. To recap if you interrupt at the right time just as the ai is generating something the whole pipeline gets "blocked" and the audio_task_handler that processes the the audio_in_queue is blocked Another issue is that if the user has a weird cadence in their speech and stops speaking for a second and resumes, there's a race condition where the received audio from the tts generation is played even though the user is currently speaking, so debugging this found two issues
In my fix I've added also check for StartInterruptionFrames in the resume_processing _pipeline and check if self._started in the receive task handler to push frames only if we can speak |
I just had a call crash. Issue seems to remain. |
Description
It is a bug. At some point of a call following is happening:
InputAudioRawFrame
is pushed from DailyInputTransport_audio_in_queue
as_audio_task_handler
ofBaseInputTransport
is never able to get theInputAudioRawFrame
and bot becomes deaf as nothing is pushed in the VAD or the other Frames in the pipeline.Environment
Issue description
The issue is that, at some point, the
_audio_in_queue
stops receivingInputAudioRawFrame
objects, even though they are still being sent from Daily. We are not entirely sure why this happens. However, when it does, theInputAudioRawFrame
objects continue arriving from Daily but are not pushed into the asyncio queue. This essentially crashes the entire pipeline.You can observe in the logs that when this occurs, the
User-Started-Speaking
log statement is also not printed. Additionally, when a participant leaves, Pipecat cancels all tasks. Strangely, after this, we start seeing theUser-Started-Speaking
andUser-Stopped-Speaking
statements, which should have been logged earlier. Also the main task for pipecat bot is kept hanging running.My suspicion is that a task is not executing properly, which is causing a blockage in the queue.
Repro steps
The bug is difficult to reproduce as it occurs randomly. However, it is most often observed later in the call when the user asks a question, and a tool is invoked. Additionally, we noticed that when this bug happens, the TTS response is abruptly cut off in the middle.
Expected behavior
Actual behavior
Logs
2024-11-13T20:21:00.339
as you can see TTS service is in middle of saying and then we don't recieve anything else.We also added some logging statements in the pipecat source which leads us to the conclusion that no InputAudioRawFrames are placed in
_audio_in_queue
, following are the logs for that. As you can see that InputAudioFrame are coming from Daily but are not pushed in the pipeline:The text was updated successfully, but these errors were encountered: