Alexa is a fully local, modular voice assistant that runs on Linux/Windows with your laptop microphone and speakers. It uses OpenWakeWord, Whisper, Llama 3.1 via Ollama, Kokoro TTS, and optional local RAG (LlamaIndex + FAISS) for private, real‑time voice interaction — no cloud calls by default.
- Wake Word via OpenWakeWord (ONNX) — low‑latency, 16 kHz pipeline
- VAD + Dynamic RMS noise gating to reduce false triggers
- STT with OpenAI Whisper (
small.en, GPU‑accelerated if available) - LLM with Ollama (default:
llama3.1:8b-instruct-q4_K_M; aliasalexa-4k) - TTS with Kokoro (hexgrad/Kokoro‑82M) — fast, warm voice
af_heart - Streaming LLM→TTS: speak partial responses immediately (SSE)
- Local file search (RAG) via LlamaIndex + FAISS, opt‑in
- Structured logging for app & perf, rotating STT transcripts
- No cloud APIs needed for voice loop; optional web search integration
.
├── main.py # Application loop (wake word → STT → LLM → TTS)
├── services/
│ ├── microservices\_loader.py # Starts FastAPI microservices (TTS/STT/LLM)
│ ├── service\_manager.py # Spawns & tracks uvicorn processes
│ ├── llm\_streaming\_server.py # LLM SSE server (/chat, /chat/stream)
│ ├── stt\_service\_server.py # STT server (/transcribe)
│ ├── tts\_service\_server.py # TTS server (/speak)
│ ├── stt\_client.py # HTTP client for STT microservice
│ ├── llm\_streaming\_client.py # HTTP/SSE client + StreamingTTSIntegration
│ ├── tts\_client.py # HTTP client for TTS microservice
│ ├── kwd\_service.py # Wake-word detection w/ VAD + RMS checks
│ ├── dynamic\_rms\_service.py # Adaptive noise thresholding
│ ├── llm\_service.py # LLM flow, intents, handlers, dialog logging
│ ├── llama\_indexing\_service.py / llama\_file\_search\_service.py # RAG
│ ├── handlers/ # file\_search, memory, note, web\_search
│ └── logger.py # JSON logs + colored console
├── config/
│ ├── Modelfile # ollama model alias (alexa-4k)
│ ├── system\_prompt.txt # persona + style
│ ├── llm\_responses.json # canned strings for handlers
│ ├── search\_config.json # RAG input folders
│ ├── notes.json, memory.log # local notes & memory
│ └── sounds/kwd\_success.wav # wake chime
├── models/
│ └── alexa\_v0.1.onnx # OpenWakeWord model (required)
└── docs/
└── LOGGING.md, services.md, …
- System deps
sudo apt-get update
sudo apt-get install -y python3-venv python3-dev portaudio19-dev ffmpeg jq- Create env
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel
pip install -r requirements.txt- Ollama + model
- Install Ollama: https://ollama.com/download
- Pull base model and build alias:
ollama pull llama3.1:8b-instruct-q4_K_M
# Optional: create local alias from config/Modelfile
ollama create alexa-4k -f config/Modelfile- Models & audio
- Place wake word model at:
models/alexa_v0.1.onnx - Ensure the wake chime exists:
config/sounds/kwd_success.wav
- Optional web search
Create a .env in the repo root:
# NOTE: Variable name matches current code (typo kept for compatibility)
TRAVILY_API=sk_... # Tavily Search API key
⚠️ Heads‑up: the code expectsTRAVILY_API(with an “R”). If you preferTAVILY_API, updateservices/web_search_service.pyaccordingly.
- Start (microservices mode, recommended)
python3 main.pyThis will:
- Spawn TTS on
:8001, STT on:8002, LLM (SSE) on:8003 - Load OpenWakeWord + VAD locally
- Announce readiness and listen for the wake word
Health endpoints:
http://127.0.0.1:8001/health(TTS),:8002/health(STT),:8003/health(LLM)
- Build local RAG index (optional)
Edit config/search_config.json to list folders to index, then:
python3 main.py --indexThis creates a FAISS index in config/faiss_index/. You can then ask, e.g., “find the project map” or “search docs about logging”.
- Wake — KWD buffers 1s windows, filters with dynamic RMS + VAD, and detects the wake word.
- STT — 16 kHz mono chunks are recorded until silence; Whisper
small.entranscribes. - LLM — Prompt goes to Ollama (default
alexa-4k/llama 3.1). If a handler intent is detected (notes, memory, web/file search), it routes accordingly. - Streaming TTS — The LLM SSE stream is chunked and immediately spoken by Kokoro (
af_heart); any residual text is flushed at completion. Fallback to non‑streaming if SSE fails.
- Persona:
config/system_prompt.txt - Canned strings:
config/llm_responses.json - Notes & memory:
config/notes.json,config/memory.log - RAG:
config/search_config.json, persisted indexconfig/faiss_index/ - Wake chime:
config/sounds/kwd_success.wav
See docs/LOGGING.md for full details. Key files:
logs/app.jsonl— structured app logs (all services)logs/performance.jsonl— durations, token/s, etc.logs/dialog_YYYY-MM-DD_HH-MM-SS.log— per‑session conversationslogs/transcriptions.log(+ daily rotation) — raw STT text
Handy:
tail -f logs/app.jsonl | jq .
tail -f logs/performance.jsonl | jq .- No mic / ALSA warnings: ensure
portaudio19-devinstalled; PulseAudio/JACK noise is suppressed in microservices servers. - Wake word not triggering: confirm
models/alexa_v0.1.onnxexists; reduce background noise or tweak the dynamic RMS multiplier. - No TTS audio: verify default audio sink;
sd.OutputStreamuses system default. - SSE stalls: check firewall (ports 8001–8003), and that
ollama serveis reachable. - Web search empty: verify
.envkey; confirm the env var name (TRAVILY_API) matches the code or update the code.
- Python 3.10+
pyaudio,webrtcvador OpenWakeWord VAD,numpytorch,openai-whisperollama(daemon running)kokoro(hexgrad/Kokoro-82M viaKPipeline)fastapi,uvicorn,sseclient-pyllama-index,faiss-cpu,sentence-transformersrequests,rich,psutil- (Optional)
.envwith Tavily API key
Install via:
pip install -r requirements.txt- Barge‑in (interrupt TTS on user speech)
- Tunable KWD thresholds & per‑environment calibration
- WebSocket/GRPC for lower‑latency streaming
- Unified config file for ports/devices/models
- Cross‑platform audio device selection UI
Alexa speaks in a concise, warm female voice (af_heart), avoids filler, and keeps responses short for voice UX.
---