fix: strip leading/trailing silence from tts output #420

ErikBjare · 2025-01-24T13:20:17Z

Kokoro generates ~300ms of silence at the start and end of every generation.

This strips that silence so that there aren't such long breaks between sentences.

Important

Adds strip_silence() to remove silence from TTS output in scripts/tts_server.py.

Behavior:
- Adds strip_silence() function in scripts/tts_server.py to remove leading/trailing silence from audio data.
- Integrates strip_silence() in text_to_speech() to process generated audio.
Functionality:
- strip_silence() uses amplitude threshold and minimum silence duration to identify and remove silence.
- Handles cases where no non-silent audio is found by returning original audio data.

^{This description was created by}^{for d79606a. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to d79606a in 1 minute and 15 seconds

More details

Looked at 61 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. scripts/tts_server.py:145

Draft comment:
The min_silence_duration parameter is incorrectly used. It should not be subtracted from the start or added to the end index, as this could trim non-silent audio. Consider revising the logic to ensure only leading and trailing silence is removed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests this is incorrect, but I disagree. The function's purpose appears to be to trim silence while preserving some minimal amount of silence/context around the actual speech. Subtracting/adding min_silence_duration achieves this goal by keeping a buffer zone. The max(0, ...) and min(len(audio_data), ...) ensure we don't go out of bounds.
I could be wrong about the intended behavior - maybe it should only trim exact silence. The parameter name "min_silence_duration" is a bit confusing if it's meant to be a buffer size.
While the parameter name could be clearer, the implementation is sound and achieves a reasonable goal of preserving some context around the speech. The bounds checking prevents any issues with array access.
The comment should be deleted as it incorrectly identifies an issue where there isn't one. The code is working as intended to preserve some silence around the speech content.

Workflow ID: wflow_O1ApIeEWPOLtre9u

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

codecov-commenter · 2025-01-24T13:22:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.62%. Comparing base (6ff8b94) to head (d79606a).

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #420      +/-   ##
==========================================
- Coverage   70.74%   70.62%   -0.12%     
==========================================
  Files          70       70              
  Lines        5862     5862              
==========================================
- Hits         4147     4140       -7     
- Misses       1715     1722       +7

Flag	Coverage Δ
anthropic/claude-3-haiku-20240307	`68.83% <ø> (+0.01%)`	⬆️
deepseek/deepseek-chat	`64.41% <ø> (+0.10%)`	⬆️
openai/gpt-4o-mini	`67.80% <ø> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: added stripping of leading/trailing silence from tts output

d79606a

ErikBjare changed the title ~~fix: added stripping of leading/trailing silence from tts output~~ fix: strip leading/trailing silence from tts output Jan 24, 2025

ellipsis-dev bot reviewed Jan 24, 2025

View reviewed changes

ErikBjare merged commit 96c54c4 into master Jan 24, 2025
9 checks passed

ErikBjare mentioned this pull request Jan 24, 2025

Speech-to-Text Transcription #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip leading/trailing silence from tts output #420

fix: strip leading/trailing silence from tts output #420

ErikBjare commented Jan 24, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

codecov-commenter commented Jan 24, 2025 •

edited

Loading

fix: strip leading/trailing silence from tts output #420

fix: strip leading/trailing silence from tts output #420

Conversation

ErikBjare commented Jan 24, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 24, 2025 • edited Loading

Codecov Report

ErikBjare commented Jan 24, 2025 •

edited by ellipsis-dev bot

Loading

codecov-commenter commented Jan 24, 2025 •

edited

Loading