Live api: add to context when outputaudiotranscription is enabled

When modality is set to AUDIO, and outputaudiotranscription is enabled, one will notice that the outputaudiotranscription is output faster than the audio. However, if the model is interrupted, only the spoken part is added to context, and the rest of the output audio transcription is not added to context. This makes sense for an audio-only use case; if the user hasn't heard something, it shouldn't be added to context. 

However, with outputaudiotranscription enabled, users will also become aware of something faster than they listen to it. Then, if a user attempts to interrupt the audio output but has seen the complete text transcription, the model will not be aware of the unspoken parts and will repeat information to the user.

This does not occur when modality is set to TEXT.

It would be nice to add a feature that allows us to enable adding to context based on transcription output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Live api: add to context when outputaudiotranscription is enabled #825

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Live api: add to context when outputaudiotranscription is enabled #825

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions