Skip to content

Approaches to let us break a current speech segment before speech end happens #31

@xd009642

Description

@xd009642

Some models may have a duration of audio above which the RTF starts to really degrade e.g. greater than a context window length. If our current speech segment starts to approach these lengths we may want to break off a bunch of the audio from the start of the segment and remove it from the buffer (a forced non-interim result).

In this case we want all the VAD parameters to be the same but the ability to identify a less-permissive break point within a silence so we can slightly shorten the segment without setting our vad to cut more eagerly.

I'm still puzzling out how to do this, but maybe we want to track the midpoint of the longest sequence of silent frames in the audio stream - and then maybe how long that silence is as well? It's a bit of added state, and then along with that we'd want a way to specify that we're evacuating some of the prior buffer and not have any speech-end stuff or vad state reset in a weird way.

If anyone has any other thoughts I'm all ears 👂 👁️ 👁️ 👂

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions