local-voice

Locally running AI voice assistant for low-latency conversations, powered by OpenAI Whisper, Ollama, and Piper TTS.

Overview

An AI Voicebot based on a 3-stage (STT->LLM->TTS) pipeline. The architecture prioritizes low-latency, even for CPU-only inference. Zero-copy audio buffers, WebRTC-VAD endpointing, token-streaming LLM calls, and chunked TTS playback combine under a single event loop with built-in back-pressure to drive sub-1s latency without specialized hardware.

Design

flowchart TD
    subgraph Local Voice[" "]
        VAD[WebRTC VAD] -->|speech segment| BUF[Audio IN buffer]
        BUF -->|audio| ASR[Whisper] -->|transcript| llm[Ollama]
        llm -->|token stream| TTS[Piper TTS]
        TTS -->|speech segment| AUDIO_OUT[Audio OUT buffer] 
    end

    MIC[Mic <i class='fas fa-microphone'>] -->|20 ms frames| VAD
    SPK[Speaker <i class='fas fa-volume-up'>]
    AUDIO_OUT -->|audio| SPK

Project Demo

A quick video demo of this project is available at https://www.loom.com/share/55a3c9d3e67c4032abc6924e67afb2c9

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
voice.py		voice.py
workers.py		workers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

local-voice

Overview

Design

Project Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Robin-07/local-voice

Folders and files

Latest commit

History

Repository files navigation

local-voice

Overview

Design

Project Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages