Parakeet-TDT 0.6B v2 FastAPI STT Service

A production-ready FastAPI service for high-accuracy English speech-to-text using NVIDIA's Parakeet-TDT 0.6B v2 model. Implements both REST and WebSocket endpoints following the OpenAI Audio API specification interface.

Features

RESTful transcription
- POST /transcribe with multipart audio uploads
- Word/character/segment timestamps
- OpenAI-compatible response schema
WebSocket streaming
- Real-time voice activity detection via Silero VAD
- Partial/final transcription delivery
- Supports 16kHz mono PCM input
Batch processing
- Micro-batching for efficient GPU utilization
- Configurable batch size and processing timeout
Production-ready deployment
- Docker and Docker Compose support
- Health checks and configuration endpoints
- Environment variable configuration
Audio preprocessing
- Automatic downmixing and resampling
- File validation and chunking

Prerequisites

Python 3.10+
NVIDIA GPU with CUDA 12.1+ (recommended)
Docker Engine 24.0+ (for container deployment)

Installation

Local Development

git clone https://github.com/your-repo/parakeet-fastapi.git
cd parakeet-fastapi

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Docker Deployment

docker build -t parakeet-stt .
docker run -d -p 8000:8000 --gpus all parakeet-stt

Docker Compose

docker-compose up --build

Configuration

All configuration is managed through environment variables. Create a .env file with your preferences:

# Model configuration
MODEL_PRECISION=fp16
DEVICE=cuda
BATCH_SIZE=4

# Audio processing
TARGET_SR=16000
MAX_AUDIO_DURATION=30
VAD_THRESHOLD=0.5

# System
LOG_LEVEL=INFO
PROCESSING_TIMEOUT=60

Running the Server

Local Development

uvicorn parakeet_service.main:app --host 0.0.0.0 --port 8000

Production

docker-compose up --build -d

Usage

REST API

Health Check

curl http://localhost:8000/healthz
# {"status":"ok"}

Transcription

curl -X POST http://localhost:8000/transcribe \
  -F file="@audio.wav" \
  -F include_timestamps=true \
  -F should_chunk=true

Parameters:

Name	Type	Default	Description
`file`	`audio/*`	Required	Audio file (wav, mp3, flac)
`include_timestamps`	bool	false	Return word/segment timestamps
`should_chunk`	bool	true	Enable audio chunking for long files

Response:

{
  "text": "Transcribed text content",
  "timestamps": {
    "words": [
      {"text": "Hello", "start": 0.2, "end": 0.5},
      {"text": "world", "start": 0.6, "end": 0.9}
    ],
    "segments": [
      {"text": "Hello world", "start": 0.2, "end": 0.9}
    ]
  }
}

WebSocket Streaming

Connect to ws://localhost:8000/ws to stream audio:

Input: 16kHz mono PCM frames (int16)
Output: JSON messages with partial/final transcriptions

JavaScript Example:

const ws = new WebSocket("ws://localhost:8000/ws");
const audioContext = new AudioContext();
const processor = audioContext.createScriptProcessor(1024, 1, 1);

processor.onaudioprocess = e => {
  const pcmData = e.inputBuffer.getChannelData(0);
  const int16Data = convertFloat32ToInt16(pcmData);
  ws.send(int16Data);
};

ws.onmessage = evt => {
  const data = JSON.parse(evt.data);
  console.log("Transcription:", data.text);
};

Architecture Overview

graph LR
A[Client] -->|HTTP| B[REST API]
A -->|WebSocket| C[Streaming API]
B --> D[Batch Worker]
C --> E[VAD Processing]
E --> F[Chunker]
F --> D
D --> G[ASR Model]
G --> H[Response Formatter]
H --> A

Components:

main.py - App initialization and lifecycle management
routes.py - REST endpoints implementation
stream_routes.py - WebSocket endpoint handler
streaming_vad.py - Voice activity detection
chunker.py - Audio segmentation
batchworker.py - Micro-batch processing
model.py - ASR model interface
audio.py - Audio preprocessing utilities
config.py - Configuration management

Environment Variables

Variable	Default	Description
`MODEL_PRECISION`	fp16	Model precision (fp16/fp32)
`DEVICE`	cuda	Computation device
`BATCH_SIZE`	4	Processing batch size
`TARGET_SR`	16000	Target sample rate
`MAX_AUDIO_DURATION`	30	Max audio length in seconds
`VAD_THRESHOLD`	0.5	Voice activity threshold
`LOG_LEVEL`	INFO	Logging verbosity
`PROCESSING_TIMEOUT`	60	Processing timeout in seconds

Contributing

Fork the repository and create your feature branch
Submit a pull request with detailed description

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
parakeet_service		parakeet_service
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
export_openapi.py		export_openapi.py
openapi.yaml		openapi.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parakeet-TDT 0.6B v2 FastAPI STT Service

Features

Table of Contents

Prerequisites

Installation

Local Development

Docker Deployment

Docker Compose

Configuration

Running the Server

Local Development

Production

Usage

REST API

Health Check

Transcription

WebSocket Streaming

Architecture Overview

Environment Variables

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Shadowfita/parakeet-tdt-0.6b-v2-fastapi

Folders and files

Latest commit

History

Repository files navigation

Parakeet-TDT 0.6B v2 FastAPI STT Service

Features

Table of Contents

Prerequisites

Installation

Local Development

Docker Deployment

Docker Compose

Configuration

Running the Server

Local Development

Production

Usage

REST API

Health Check

Transcription

WebSocket Streaming

Architecture Overview

Environment Variables

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages