Automatically deploy MLflow models to KServe when tagged for deployment.
This service listens for MLflow model version tag events and automatically creates/deletes KServe InferenceServices based on the deploy tag. When a model version is tagged with deploy=true, the service deploys it to KServe. When the tag is removed, the service deletes the deployment.
- Event-driven deployments: Webhook-based real-time model deployments
- Polling fallback: Automatic fallback to polling mode if webhooks are unavailable
- Automatic cleanup: Removes InferenceServices when deploy tag is removed
- Cloud-agnostic: Works with GCS, S3, Azure Blob Storage, and more
- Customizable: Configurable resource limits, namespaces, and deployment parameters
- KServe v2 protocol: Uses the v2 inference protocol for broad model framework support
See k8s/README.md for Kubernetes deployment instructions.
- MLflow triggers a webhook when a model version is tagged with
deploy=true - The webhook listener receives the event and fetches model details from MLflow
- An InferenceService is created in the configured Kubernetes namespace
- KServe deploys the model with the MLflow serving runtime (using v2 protocol)
- When the
deploytag is removed, the InferenceService is deleted
All configuration is done via environment variables. See k8s/README.md for details.
# Install dependencies with pixi
pixi install
# Run tests
pixi run pytest
# Run the service locally
pixi run webhook-listenerApache-2.0