Deepgram Speech Segmentation Heuristics

This repository contains reference implementations for robust end-of-speech detection using a combination of Deepgram's real-time transcription API features and custom heuristics. The goal is to demonstrate how low-latency, real-time solutions can be created using the Deepgram API, open-source Voice Activity Detection (VAD), and tailored heuristics.

Overview

The project showcases how to combine various Deepgram API features with local processing to achieve accurate and low-latency utterance segmentation. It utilizes:

Deepgram API features:
- Endpointing
- Utterance End
- Word-level timestamps
Local Voice Activity Detection (VAD) using silero-vad
Custom heuristics for speech detection and endpointing

The system is designed to be modular, allowing for easy addition and modification of different event handlers and heuristics.

Project Structure

The repository is organized as follows:

project_root/
│
├── base_heuristic.py
├── vad.py
├── examples
│ └── vad_implementation
│  ├── heuristic.py
│  ├── terminal_renderer.py
│  ├── main.py
│  └── README.md
├── requirements.txt
└── README.md

base_heuristic.py: Contains the base Heuristic class for implementing custom logic.
vad.py: Implements Voice Activity Detection using silero-vad.
examples/: Contains different implementation examples.
- vad_implementation/: An example implementation using VAD and custom heuristics.

Getting Started

To use any of the reference implementations:

Navigate to the specific example folder (e.g., examples/vad_implementation/).
Follow the README instructions in that folder for setup and execution.

Each example folder contains its own requirements.txt file and specific instructions for running the implementation.

Examples

VAD Implementation

The VAD implementation demonstrates how to combine Deepgram's real-time transcription with local Voice Activity Detection for advanced end-of-speech detection. It showcases the use of silero-vad alongside Deepgram's API features.

For more details, see the README in the examples/vad_implementation/ folder.

Future Examples

While the current VAD implementation represents our recommended approach for robust, low-latency speech detection, we plan to add the following examples:

A web app implementation demonstrating the VAD approach with a simple web frontend
Examples showing heuristic approaches to end-of-speech detection without using a local VAD, relying solely on analysis of transcript results

It's important to note that for the most reliable and low-latency performance, we recommend using a local VAD (such as silero-VAD) as close as possible to where audio enters the application. This forms the cornerstone of a robust heuristic approach. Other examples are provided to demonstrate alternative methods and use cases, but may not achieve the same level of performance as the local VAD-based approach.

Dependencies

The main dependencies for this project are:

Deepgram Python SDK: For interfacing with the Deepgram API
silero-vad: For local Voice Activity Detection

Specific dependencies for each implementation are listed in the respective requirements.txt files.

Note

This is a reference implementation intended for demonstration purposes. It showcases how to leverage Deepgram's API features alongside custom processing for advanced speech endpointing scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deepgram Speech Segmentation Heuristics

Overview

Project Structure

Getting Started

Examples

VAD Implementation

Future Examples

Dependencies

Note

About

Uh oh!

Releases

Packages

Uh oh!

deepgram/deepgram-eos-heuristics

Folders and files

Latest commit

History

Repository files navigation

Deepgram Speech Segmentation Heuristics

Overview

Project Structure

Getting Started

Examples

VAD Implementation

Future Examples

Dependencies

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages