Skip to content

Locally running AI voice assistant for low-latency conversations, powered by OpenAI Whisper, Ollama, and Piper TTS.

Notifications You must be signed in to change notification settings

Robin-07/local-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

local-voice

Locally running AI voice assistant for low-latency conversations, powered by OpenAI Whisper, Ollama, and Piper TTS.

Overview

An AI Voicebot based on a 3-stage (STT->LLM->TTS) pipeline. The architecture prioritizes low-latency, even for CPU-only inference. Zero-copy audio buffers, WebRTC-VAD endpointing, token-streaming LLM calls, and chunked TTS playback combine under a single event loop with built-in back-pressure to drive sub-1s latency without specialized hardware.

Design

flowchart TD
    subgraph Local Voice[" "]
        VAD[WebRTC VAD] -->|speech segment| BUF[Audio IN buffer]
        BUF -->|audio| ASR[Whisper] -->|transcript| llm[Ollama]
        llm -->|token stream| TTS[Piper TTS]
        TTS -->|speech segment| AUDIO_OUT[Audio OUT buffer] 
    end

    MIC[Mic <i class='fas fa-microphone'>] -->|20 ms frames| VAD
    SPK[Speaker <i class='fas fa-volume-up'>]
    AUDIO_OUT -->|audio| SPK
Loading

Project Demo

A quick video demo of this project is available at https://www.loom.com/share/55a3c9d3e67c4032abc6924e67afb2c9

About

Locally running AI voice assistant for low-latency conversations, powered by OpenAI Whisper, Ollama, and Piper TTS.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages