Approaches to let us break a current speech segment before speech end happens

Some models may have a duration of audio above which the RTF starts to really degrade e.g. greater than a context window length. If our current speech segment starts to approach these lengths we may want to break off a bunch of the audio from the start of the segment and remove it from the buffer (a forced non-interim result). 

In this case we want all the VAD parameters to be the same but the ability to identify a less-permissive break point within a silence so we can slightly shorten the segment without setting our vad to cut more eagerly.

I'm still puzzling out how to do this, but maybe we want to track the midpoint of the longest sequence of silent frames in the audio stream - and then maybe how long that silence is as well? It's a bit of added state, and then along with that we'd want a way to specify that we're evacuating some of the prior buffer and not have any speech-end stuff or vad state reset in a weird way.

If anyone has any other thoughts I'm all ears :ear: :eye: :eye: :ear: 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Approaches to let us break a current speech segment before speech end happens #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approaches to let us break a current speech segment before speech end happens #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions