A Docker-based STT (Automatic Speech Recognition) system using faster-whisper with a Gradio web UI.
stt-service/ # FastAPI service with faster‑whisper + CTranslate2
├── Dockerfile # Builds CUDA/cuDNN image, installs dependencies, converts model
└── app.py # FastAPI app: /healthz (instant), /transcribe (lazy model load + auth)
webui-service/ # Gradio front-end
├── Dockerfile # Builds Slim Python image with Python libs and ffmpeg
├── requirements.txt
└── app.py # Gradio UI with basic auth, sends X-API-KEY header
docker-compose.yml # Defines services, ports, and simple depends_on
README.md # This documentation
- faster‑whisper with CTranslate2 for ~2× real-time transcription on GPU
- Lazy loading: model loads on first request (20–30 s), then stays in memory
- Token‑based auth: secure
/transcribewithX-API-KEY - UI Basic Auth: protect Gradio interface with username/password
- Multi‑format support: WAV, MP3, OGG/Opus, M4A, FLAC, AMR, etc.
- Health check:
/healthzreturns OK immediately
git clone https://github.com/nek1987/stt-service-webui.git
cd stt-service-webui
# (optional) git archive -o backup.tar HEADIn your .env or directly in docker-compose.yml:
services:
stt-service:
environment:
- MODEL_PATH=/models/islomov_navaistt_v2_medium_ct2
- NVIDIA_VISIBLE_DEVICES=0
# Multiple API tokens (comma-separated, no spaces)
- API_TOKENS=${API_TOKENS} # e.g., API_TOKENS=token-alpha,token-bravo,token-charlie
# Single token (backwards compatible)
- API_TOKEN=${API_TOKEN} # e.g., API_TOKEN=single-token-key
webui-service:
environment:
- STT_API=http://stt-service:5085/transcribe
- API_TOKEN=${API_TOKEN} # webui uses single token
- UI_USER=${UI_LOGIN}
- UI_PASS=${UI_PASS}In your .env file:
# API authentication (choose one method)
API_TOKENS=token-alpha,token-bravo,token-charlie # Multiple tokens
# OR
API_TOKEN=single-token-key # Single token
# UI authentication
UI_LOGIN=admin
UI_PASS=s3cretdocker-compose down --rmi local # optional: remove old images
docker-compose build # build both services
docker-compose up -d # start in detached mode# STT service health
curl http://localhost:5085/healthz
# → {"status":"ok"}
# Gradio UI
open http://localhost:7860
# Will prompt for user/pass (UI_USER/UI_PASS)-
Port: 5085
-
Endpoints:
GET /healthz→{"status":"ok"}POST /transcribe(multipartfile@, headerX-API-KEY)
-
Auth: include
X-API-KEYwhose value matches one of the configured tokens. UseAPI_TOKENS=token1,token2(comma-separated list) for multiple keys, orAPI_TOKENfor a single legacy token. -
Model path: baked in
/models/islomov_navaistt_v1_medium_ct2 -
Lazy load: model initializes on first
/transcribe
- Multiple tokens: set
API_TOKENSto a comma-separated list without spaces, e.g.API_TOKENS=service-a-key,service-b-key,service-c-key. - Single token (backwards compatible): define
API_TOKEN=service-a-key. The value is automatically combined with any tokens inAPI_TOKENS. - No tokens: omit both variables to leave the
/transcribeendpoint open (not recommended for production deployments).
Token validation:
- Minimum length: 8 characters (shorter tokens are rejected with a warning)
- Whitespace is automatically trimmed
- Duplicate tokens are automatically removed
- On startup, all configured tokens are logged (masked for security, e.g.,
token-***pha)
Token usage logging:
- Each authenticated request logs the masked token used (e.g.,
token-***pha) - Failed authentication attempts are logged with the invalid token (masked)
- No API key provided: logged as "No API key provided"
- Port: 7860
- Auth: basic HTTP auth (username/UI_USER, password/UI_PASS)
- UI → Service: sends
X-API-KEYheader automatically
GET http://<host>:5085/healthz
→ 200 OK {"status":"ok"}
POST http://<host>:5085/transcribe
Headers:
X-API-KEY: token-alpha
Accept: application/json
Body:
multipart/form-data, field "file" = audio file
Response 200 OK:
{ "text": "transcribed text here" }Errors:
401 Unauthorizedif missing/invalid token400 Bad Requestif no file500 Internal Server Erroron model/load failures
- Increase GPU concurrency by running multiple instances behind a load balancer.
- To support streaming partial results, integrate
model.transcribe(..., stream=True). - Tune
beam_sizeandcompute_typeinapp.pyfor quality vs. speed.
Jamshid Radjabov — Telecom expert and AI Enthusias .
Pull requests and issues are welcome!