Whisper.cpp without VAD gets stuck in a loop when translating. Using VAD loses accuracy and dialogue

I'm trying to translate some content from Japanese and German. Using large-v3. I often run into issues where the file gets caught in a loop while translating if I don't use VAD. It looks something like this.
Command without VAD:
`./build/bin/whisper-cli -m models/ggml-large-v3.bin tr -osrt -of samples/p-test input.wav`
```
[00:00:00.000 --> 00:00:29.980]   Thank you.
[00:00:30.000 --> 00:00:59.980]   Thank you.
[00:01:00.000 --> 00:01:29.980]   Thank you.
[00:01:30.000 --> 00:01:59.980]   Thank you.
[00:02:00.000 --> 00:02:29.980]   Thank you.
[00:02:30.000 --> 00:02:59.980]   Thank you.
[00:03:00.000 --> 00:03:29.980]   Thank you.
[00:03:30.000 --> 00:03:59.980]   Thank you.
[00:04:00.000 --> 00:04:29.980]   Thank you.
[00:04:30.000 --> 00:04:59.980]   Thank you.
[00:05:00.000 --> 00:05:29.980]   Thank you.
[00:05:30.000 --> 00:05:59.980]   Thank you.
[00:06:00.000 --> 00:06:26.000]   Thank you.
[00:06:26.000 --> 00:06:29.980]   Thank you.
[00:06:30.000 --> 00:06:59.980]   Thank you.
```
Command with VAD:
`./build/bin/whisper-cli -m models/ggml-large-v3.bin --vad --vad-model models/silero-v5.1.2-ggml.bin -tr -osrt -of samples/p-test input.wav`

Without VAD the dialogue tends to be translated a bit more accurately when it is not looping (at least I assume so). It also tends to be more complete. Changing the threshold, padding and silence duration does not seem to make a difference and at too high a threshold I get the translation loop again. 

I'm not sure what needs to be addressed here or if it's just a current issue with translation. It seems whisperX has a larger VAD API but I can't currently find it for comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper.cpp without VAD gets stuck in a loop when translating. Using VAD loses accuracy and dialogue #3278

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Whisper.cpp without VAD gets stuck in a loop when translating. Using VAD loses accuracy and dialogue #3278

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions