This application provides an interactive dashboard and chatbot interface to analyze AI model performance metrics collected from Prometheus and generate human-like summaries using a Llama model deployed on OpenShift AI.
It helps teams understand what’s going well, what’s going wrong, and receive actionable recommendations on their vLLM deployments — all automatically.
- Visualize core vLLM metrics (GPU usage, latency, request volume, etc.)
- Generate summaries using a fine-tuned Llama model
- Chat with an MLOps assistant based on real metrics
- Fully configurable via environment variables and Helm-based deployment
- Prometheus: Collects and exposes AI model metrics
- Streamlit App: Renders dashboard, handles summarization and chat
- LLM (Llama 3.x): Deployed on OpenShift AI and queried via
/v1/completions
API
- Kubernetes or OpenShift cluster
oc
orkubectl
CLI configured
Use the included Makefile
to install everything:
cd deploy/helm
make install NAMESPACE=llama-stack-summarize \
LLM=llama-3-2-3b-instruct \
LLM_TOLERATION="nvidia.com/gpu" \
SAFETY=llama-guard-3-8b \
SAFETY_TOLERATION="nvidia.com/gpu"
This will:
- Deploy Prometheus
- Deploy Llama models
- Extract their URLs
- Create a ConfigMap with available models
- Deploy the Streamlit dashboard connected to the LLM
To uninstall:
make uninstall NAMESPACE=llama-namespace
- Open the route exposed by the
metric-ui
Helm chart (e.g.,https://metrics-ui.apps.cluster.local
) - Select the AI model whose metrics you want to analyze
- Click Analyze Metrics to generate a summary
- Use the Chat Assistant tab to ask follow-up questions
We welcome contributions and feedback!
Please open issues or submit PRs to improve this dashboard or expand model compatibility.