Order between `UserStoppedSpeakingFrame` from VAD and `TranscriptionFrame` from TTS #859

zizhong · 2024-12-14T07:49:41Z

Description

Previously our logic used to assume UserStoppedSpeakingFrame comes after TranscriptionFrame. So if user stopped speaking, we will get all the text from TranscriptionFrames and use UserStoppedSpeakingFrame as a turn indicator.

However, I found now it is different. UserStoppedSpeakingFrame can come before TranscriptionFrame.
Is it a bug or expected behavior? If expected, what is the recommanded turn indicator?

If reporting a bug, please fill out the following:

Environment

pipecat-ai version: 0.0.50
python version: 3.10
OS: ubuntu

Issue description

UserStoppedSpeakingFrame can come before TranscriptionFrame.

Repro steps

Pipeline with

FastAPIWebsocketTransport + SileroVADAnalyzer
STT service

Expected behavior

UserStoppedSpeakingFrame as a turn indicator.

Actual behavior

UserStoppedSpeakingFrame is not a turn indicator.

Logs

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order between `UserStoppedSpeakingFrame` from VAD and `TranscriptionFrame` from TTS #859

Order between `UserStoppedSpeakingFrame` from VAD and `TranscriptionFrame` from TTS #859

zizhong commented Dec 14, 2024

Order between UserStoppedSpeakingFrame from VAD and TranscriptionFrame from TTS #859

Order between UserStoppedSpeakingFrame from VAD and TranscriptionFrame from TTS #859

Comments

zizhong commented Dec 14, 2024

Description

Environment

Issue description

Repro steps

Expected behavior

Actual behavior

Logs

Order between `UserStoppedSpeakingFrame` from VAD and `TranscriptionFrame` from TTS #859

Order between `UserStoppedSpeakingFrame` from VAD and `TranscriptionFrame` from TTS #859