Skip to content

Live api: add to context when outputaudiotranscription is enabled #825

@kaiseenn

Description

@kaiseenn

When modality is set to AUDIO, and outputaudiotranscription is enabled, one will notice that the outputaudiotranscription is output faster than the audio. However, if the model is interrupted, only the spoken part is added to context, and the rest of the output audio transcription is not added to context. This makes sense for an audio-only use case; if the user hasn't heard something, it shouldn't be added to context.

However, with outputaudiotranscription enabled, users will also become aware of something faster than they listen to it. Then, if a user attempts to interrupt the audio output but has seen the complete text transcription, the model will not be aware of the unspoken parts and will repeat information to the user.

This does not occur when modality is set to TEXT.

It would be nice to add a feature that allows us to enable adding to context based on transcription output.

Metadata

Metadata

Assignees

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.status:awaiting user responseissues requiring a response from the usertype: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions