End to end example integrating MLFlow and Seldon, with A/B testing of the models. The slides accompanying this demo can be found here.
The training part of the example assumes that you are able to run mlflow
on your local environment.
To set it up, you can run:
!pip install -r requirements.txt
The serving side of the example assumes that you've got access to a Kubernetes cluster where Seldon Core is installed.
If you don't have access to a local cluster, feel free to use kind
.
For instructions on how to install Seldon Core, please check their setup docs.
Additionally, after we deploy the models, we will compare their performance using Seldon Core's integration with Prometheus and Grafana. For that part to work, we will need to install Prometheus and Grafana.
To speed things up, we can do this through the seldon-core-analytics
chart.
!helm install seldon-core-analytics \
seldon-core-analytics \
--namespace seldon-system \
--repo https://storage.googleapis.com/seldon-charts \
--set grafana.adminPassword=password \
--set grafana.adminUser=admin
This first section will cover how to train models using MLFlow.
The MLproject file defines:
- The environment where the training runs.
- The hyperparameters that can be tweaked. In our case, these are
${\alpha, l_{1}}$ . - The interface to train the model.
%%writefile ./training/MLproject
name: mlflow-talk
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"
This allows us to have a single command to train the model.
$ mlflow run ./training -P alpha=... -P l1_ratio=...
For our example, we will train two versions of the model, which we'll later compare using A/B testing.
-
$M_{1}$ with$\alpha = 0.5$ -
$M_{2}$ with$\alpha = 0.75$
!mlflow run ./training -P alpha=0.1
!mlflow run ./training -P alpha=1.0
The train.py
script uses the mlflow.log_param()
and mlflow.log_metric()
commands to track each experiment. These are part of the MLtrack
API, which tracks experiments parameters and results. These can be stored on a remote server, which can then be shared across the entire team. However, on our example we will store these locally on a mlruns
folder.
!ls mlruns/0
We can also run mlflow ui
to show these visually. This will start the MLflow server in http://localhost:5000.
$ mlflow ui
The MLmodel
file allows us to version and share models easily. Below we can see an example.
!cat ./mlruns/0/5a6be5a1ef844783a50a6577745dbdc3/artifacts/model/MLmodel
As we can see above the MLmodel
keeps track, between others, of
- The experiment id,
5a6be5a1ef844783a50a6577745dbdc3
- Date
- Version of
sklearn
- How the model was stored
As we shall see shortly, the pre-packaged Seldon's model server will use this file to serve this model.
As a last step, we will persist the models we have just trained using MLflow
. For that, we will upload them into Google Cloud Storage. Note that to run these commands you need write access into the gs://seldon-models
bucket and you need to have gsutil
set up.
Note that in a production setting, MLflow would be configured to log models against a persistent data store (e.g. GCS, Minio, etc.). In that case, this manual step wouldn't be needed.
We will upload both versions of the model to:
gs://seldon-models/mlflow/model-a
gs://seldon-models/mlflow/model-b
!gsutil cp -r mlruns/0/c047eddcdc2d4a08963f08516fd18d74/artifacts/model/* gs://seldon-models/mlflow/model-a
!gsutil cp -r mlruns/0/1aab766b2d9246ed85f8dbfec4e8743d/artifacts/model/* gs://seldon-models/mlflow/model-b
To serve this model we will use Seldon.
Once the cluster is set up, the next step will to upload these models into a common repository and to deploy two SeldonDeployment
specs to k8s
. As we can see below, we will route 50% of the traffic to each of the models.
%%writefile ./serving/model-a-b.yaml
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: wines-classifier
spec:
name: wines-classifier
predictors:
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: gs://seldon-models/mlflow/model-a
name: wines-classifier
name: model-a
replicas: 1
traffic: 50
componentSpecs:
- spec:
# We are setting high failureThreshold as installing conda dependencies
# can take long time and we want to avoid k8s killing the container prematurely
containers:
- name: wines-classifier
livenessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: gs://seldon-models/mlflow/model-b
name: wines-classifier
name: model-b
replicas: 1
traffic: 50
componentSpecs:
- spec:
# We are setting high failureThreshold as installing conda dependencies
# can take long time and we want to avoid k8s killing the container prematurely
containers:
- name: wines-classifier
livenessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
!kubectl apply -f ./serving/model-a-b.yaml
We can verify these have been deployed by checking the pods and SeldonDeployment
resources in the cluster.
!kubectl get pods
!kubectl get sdep
We will now run a sample query to test that the inference graph is working.
!curl \
-X POST \
-H 'Content-Type: application/json' \
-d '{\
"data": { \
"names": ["fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide","total sulfur dioxide","density","pH","sulphates","alcohol"], \
"ndarray": [[7,0.27,0.36,20.7,0.045,45,170,1.001,3,0.45,8.8]] \
} \
}' \
http://localhost:8083/seldon/default/wines-classifier/api/v1.0/predictions
To access Grafana, it will be necessary to forward the port to the respective pod as we did previously to access the Seldon Core deployment.
The credentials will be simply admin
// password
.
This command needs to run constantly on the background, so please make sure you run it on a separate terminal.
$ kubectl port-forward \
$(kubectl get pods \
-l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') \
3000:3000
Now that we have both models running in production, we can analyse their performance using Seldon Core's integration with Prometheus and Grafana.
To do so, we will iterate over the training set (which can be foud in ./training/wine-quality.csv
), making a request and sending the feedback of the prediction.
Since the /feedback
endpoint requires a reward
signal (i.e. higher better), we will simulate one as
, where
### %%writefile feedback.py
import pandas as pd
import numpy as np
from seldon_core.seldon_client import SeldonClient
sc = SeldonClient(
gateway="istio",
namespace="default",
gateway_endpoint="localhost:8083",
deployment_name='wines-classifier')
df = pd.read_csv("./training/wine-quality.csv")
def _get_reward(y, y_pred):
if y == y_pred:
return 500
return 1 / np.square(y - y_pred)
def _test_row(row):
input_features = row[:-1]
feature_names = input_features.index.to_list()
X = input_features.values.reshape(1, -1)
y = row[-1].reshape(1, -1)
r = sc.predict(
data=X,
names=feature_names)
y_pred = r.response['data']['tensor']['values']
reward = _get_reward(y, y_pred)
sc.feedback(
prediction_request=r.request,
prediction_response=r.response,
reward=reward)
return reward[0]
df.apply(_test_row, axis=1)
!python feedback.py
We can now access the Grafana dashboard in http://localhost:3000 (credentials are admin
// password
). Inside the portal, we will go to the Prediction Analytics dashboard.
We can see a snapshot below.