A production-ready FastAPI service for high-accuracy English speech-to-text using NVIDIA's Parakeet-TDT 0.6B v2 model. Implements both REST and WebSocket endpoints following the OpenAI Audio API specification interface.
-
RESTful transcription
POST /transcribewith multipart audio uploads- Word/character/segment timestamps
- OpenAI-compatible response schema
-
WebSocket streaming
- Real-time voice activity detection via Silero VAD
- Partial/final transcription delivery
- Supports 16kHz mono PCM input
-
Batch processing
- Micro-batching for efficient GPU utilization
- Configurable batch size and processing timeout
-
Production-ready deployment
- Docker and Docker Compose support
- Health checks and configuration endpoints
- Environment variable configuration
-
Audio preprocessing
- Automatic downmixing and resampling
- File validation and chunking
- Prerequisites
- Installation
- Configuration
- Running the Server
- Usage
- Architecture Overview
- Environment Variables
- Contributing
- License
- Python 3.10+
- NVIDIA GPU with CUDA 12.1+ (recommended)
- Docker Engine 24.0+ (for container deployment)
git clone https://github.com/your-repo/parakeet-fastapi.git
cd parakeet-fastapi
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtdocker build -t parakeet-stt .
docker run -d -p 8000:8000 --gpus all parakeet-sttdocker-compose up --buildAll configuration is managed through environment variables. Create a .env file with your preferences:
# Model configuration
MODEL_PRECISION=fp16
DEVICE=cuda
BATCH_SIZE=4
# Audio processing
TARGET_SR=16000
MAX_AUDIO_DURATION=30
VAD_THRESHOLD=0.5
# System
LOG_LEVEL=INFO
PROCESSING_TIMEOUT=60uvicorn parakeet_service.main:app --host 0.0.0.0 --port 8000docker-compose up --build -dcurl http://localhost:8000/healthz
# {"status":"ok"}curl -X POST http://localhost:8000/transcribe \
-F file="@audio.wav" \
-F include_timestamps=true \
-F should_chunk=trueParameters:
| Name | Type | Default | Description |
|---|---|---|---|
file |
audio/* |
Required | Audio file (wav, mp3, flac) |
include_timestamps |
bool | false | Return word/segment timestamps |
should_chunk |
bool | true | Enable audio chunking for long files |
Response:
{
"text": "Transcribed text content",
"timestamps": {
"words": [
{"text": "Hello", "start": 0.2, "end": 0.5},
{"text": "world", "start": 0.6, "end": 0.9}
],
"segments": [
{"text": "Hello world", "start": 0.2, "end": 0.9}
]
}
}Connect to ws://localhost:8000/ws to stream audio:
- Input: 16kHz mono PCM frames (int16)
- Output: JSON messages with partial/final transcriptions
JavaScript Example:
const ws = new WebSocket("ws://localhost:8000/ws");
const audioContext = new AudioContext();
const processor = audioContext.createScriptProcessor(1024, 1, 1);
processor.onaudioprocess = e => {
const pcmData = e.inputBuffer.getChannelData(0);
const int16Data = convertFloat32ToInt16(pcmData);
ws.send(int16Data);
};
ws.onmessage = evt => {
const data = JSON.parse(evt.data);
console.log("Transcription:", data.text);
};graph LR
A[Client] -->|HTTP| B[REST API]
A -->|WebSocket| C[Streaming API]
B --> D[Batch Worker]
C --> E[VAD Processing]
E --> F[Chunker]
F --> D
D --> G[ASR Model]
G --> H[Response Formatter]
H --> A
Components:
main.py- App initialization and lifecycle managementroutes.py- REST endpoints implementationstream_routes.py- WebSocket endpoint handlerstreaming_vad.py- Voice activity detectionchunker.py- Audio segmentationbatchworker.py- Micro-batch processingmodel.py- ASR model interfaceaudio.py- Audio preprocessing utilitiesconfig.py- Configuration management
| Variable | Default | Description |
|---|---|---|
MODEL_PRECISION |
fp16 | Model precision (fp16/fp32) |
DEVICE |
cuda | Computation device |
BATCH_SIZE |
4 | Processing batch size |
TARGET_SR |
16000 | Target sample rate |
MAX_AUDIO_DURATION |
30 | Max audio length in seconds |
VAD_THRESHOLD |
0.5 | Voice activity threshold |
LOG_LEVEL |
INFO | Logging verbosity |
PROCESSING_TIMEOUT |
60 | Processing timeout in seconds |
- Fork the repository and create your feature branch
- Submit a pull request with detailed description