Skip to content

Conversation

@mazy06000
Copy link

What

This PR extends the predict() function to accept a new vad_parameters argument. The parameter is forwarded to model.transcribe(...), enabling custom control over VAD settings such as silence threshold and speech padding.

Why

  • Improves flexibility for users seeking fine-grained voice activity detection.
  • Empowers developers to reduce segment length and avoid repeated content by tuning VAD.
  • Addresses community-requested enhancement; enhances overall user control.

How

  • Updated predict() signature to include vad_parameters=None
  • Passed vad_parameters through to model.transcribe(...) call
  • Updated handler to parse vad_parameters from user payload
  • Updated INPUT_VALIDATIONS schema to validate vad_parameters as a dict with a default of None.

Example usage

{
  "input": {
    "enable_vad": true,
    "vad_parameters": {
      "min_silence_duration_ms": 50,
      "speech_pad_ms": 20
    },
    ...
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant