Use TensorFlow Serving with Kubernetes

This tutorial shows how to use TensorFlow Serving components running in Docker containers to serve the TensorFlow ResNet model and how to deploy the serving cluster with Kubernetes.

To learn more about TensorFlow Serving, we recommend TensorFlow Serving basic tutorial and TensorFlow Serving advanced tutorial.

To learn more about TensorFlow ResNet model, we recommend reading ResNet in TensorFlow.

Part 1 gets your environment setup
Part 2 shows how to run the local Docker serving image
Part 3 shows how to deploy in Kubernetes.

Part 1: Setup

Before getting started, first install Docker.

Download the ResNet SavedModel

Let's clear our local models directory in case we already have one:

rm -rf /tmp/resnet

Deep residual networks, or ResNets for short, provided the breakthrough idea of identity mappings in order to enable training of very deep convolutional neural networks. For our example, we will download a TensorFlow SavedModel of ResNet for the ImageNet dataset.

mkdir /tmp/resnet
curl -s http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | \
tar --strip-components=2 -C /tmp/resnet -xvz

We can verify we have the SavedModel:

$ ls /tmp/resnet/*
saved_model.pb  variables

Part 2: Running in Docker

Commit image for deployment

Now we want to take a serving image and commit all changes to a new image $USER/resnet_serving for Kubernetes deployment.

First we run a serving image as a daemon:

docker run -d --name serving_base tensorflow/serving

Next, we copy the ResNet model data to the container's model folder:

docker cp /tmp/resnet serving_base:/models/resnet

Finally we commit the container to serving the ResNet model:

docker commit --change "ENV MODEL_NAME resnet" serving_base \
  $USER/resnet_serving

Now let's stop the serving base container

docker kill serving_base
docker rm serving_base

Start the server

Now let's start the container with the ResNet model so it's ready for serving, exposing the gRPC port 8500:

docker run -p 8500:8500 -t $USER/resnet_serving &

Query the server

For the client, we will need to clone the TensorFlow Serving GitHub repo:

git clone https://github.com/tensorflow/serving
cd serving

Query the server with resnet_client_grpc.py. The client downloads an image and sends it over gRPC for classification into ImageNet categories.

tools/run_in_docker.sh python tensorflow_serving/example/resnet_client_grpc.py

This should result in output like:

outputs {
  key: "classes"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 1
      }
    }
    int64_val: 286
  }
}
outputs {
  key: "probabilities"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 1001
      }
    }
    float_val: 2.41628322328e-06
    float_val: 1.90121829746e-06
    float_val: 2.72477100225e-05
    float_val: 4.42638565801e-07
    float_val: 8.98362372936e-07
    float_val: 6.84421956976e-06
    float_val: 1.66555237229e-05
...
    float_val: 1.59407863976e-06
    float_val: 1.2315689446e-06
    float_val: 1.17812135159e-06
    float_val: 1.46365800902e-05
    float_val: 5.81210713335e-07
    float_val: 6.59980651108e-05
    float_val: 0.00129527016543
  }
}
model_spec {
  name: "resnet"
  version {
    value: 1538687457
  }
  signature_name: "serving_default"
}

It works! The server successfully classifies a cat image!

Part 3: Deploy in Kubernetes

In this section we use the container image built in Part 0 to deploy a serving cluster with Kubernetes in the Google Cloud Platform.

GCloud project login

Here we assume you have created and logged in a gcloud project named tensorflow-serving.

gcloud auth login --project tensorflow-serving

Create a container cluster

First we create a Google Kubernetes Engine cluster for service deployment.

$ gcloud container clusters create resnet-serving-cluster --num-nodes 5

Which should output something like:

Creating cluster resnet-serving-cluster...done.
Created [https://container.googleapis.com/v1/projects/tensorflow-serving/zones/us-central1-f/clusters/resnet-serving-cluster].
kubeconfig entry generated for resnet-serving-cluster.
NAME                       ZONE           MASTER_VERSION  MASTER_IP        MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
resnet-serving-cluster  us-central1-f  1.1.8           104.197.163.119  n1-standard-1  1.1.8         5          RUNNING

Set the default cluster for gcloud container command and pass cluster credentials to kubectl.

gcloud config set container/cluster resnet-serving-cluster
gcloud container clusters get-credentials resnet-serving-cluster

which should result in:

Fetching cluster endpoint and auth data.
kubeconfig entry generated for resnet-serving-cluster.

Upload the Docker image

Let's now push our image to the Google Container Registry so that we can run it on Google Cloud Platform.

First we tag the $USER/resnet_serving image using the Container Registry format and our project name,

docker tag $USER/resnet_serving gcr.io/tensorflow-serving/resnet

Next, we configure Docker to use gcloud as a credential helper:

gcloud auth configure-docker

Next we push the image to the Registry,

docker push gcr.io/tensorflow-serving/resnet

Create Kubernetes Deployment and Service

The deployment consists of 3 replicas of resnet_inference server controlled by a Kubernetes Deployment. The replicas are exposed externally by a Kubernetes Service along with an External Load Balancer.

We create them using the example Kubernetes config resnet_k8s.yaml.

kubectl create -f tensorflow_serving/example/resnet_k8s.yaml

With output:

deployment "resnet-deployment" created
service "resnet-service" created

To view status of the deployment and pods:

$ kubectl get deployments
NAME                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
resnet-deployment    3         3         3            3           5s

$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
resnet-deployment-bbcbc   1/1       Running   0          10s
resnet-deployment-cj6l2   1/1       Running   0          10s
resnet-deployment-t1uep   1/1       Running   0          10s

To view status of the service:

$ kubectl get services
NAME                    CLUSTER-IP       EXTERNAL-IP       PORT(S)     AGE
resnet-service       10.239.240.227   104.155.184.157   8500/TCP    1m

It can take a while for everything to be up and running.

$ kubectl describe service resnet-service
Name:           resnet-service
Namespace:      default
Labels:         run=resnet-service
Selector:       run=resnet-service
Type:           LoadBalancer
IP:         10.239.240.227
LoadBalancer Ingress:   104.155.184.157
Port:           <unset> 8500/TCP
NodePort:       <unset> 30334/TCP
Endpoints:      <none>
Session Affinity:   None
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  1m        1m      1   {service-controller }           Normal      CreatingLoadBalancer    Creating load balancer
  1m        1m      1   {service-controller }           Normal      CreatedLoadBalancer Created load balancer

The service external IP address is listed next to LoadBalancer Ingress.

Query the model

We can now query the service at its external address from our local host.

$ tools/run_in_docker.sh python \
  tensorflow_serving/example/resnet_client_grpc.py \
  --server=104.155.184.157:8500
outputs {
  key: "classes"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 1
      }
    }
    int64_val: 286
  }
}
outputs {
  key: "probabilities"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 1001
      }
    }
    float_val: 2.41628322328e-06
    float_val: 1.90121829746e-06
    float_val: 2.72477100225e-05
    float_val: 4.42638565801e-07
    float_val: 8.98362372936e-07
    float_val: 6.84421956976e-06
    float_val: 1.66555237229e-05
...
    float_val: 1.59407863976e-06
    float_val: 1.2315689446e-06
    float_val: 1.17812135159e-06
    float_val: 1.46365800902e-05
    float_val: 5.81210713335e-07
    float_val: 6.59980651108e-05
    float_val: 0.00129527016543
  }
}
model_spec {
  name: "resnet"
  version {
    value: 1538687457
  }
  signature_name: "serving_default"
}

You have successfully deployed the ResNet model serving as a service in Kubernetes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serving_kubernetes.md

serving_kubernetes.md

Use TensorFlow Serving with Kubernetes

Part 1: Setup

Download the ResNet SavedModel

Part 2: Running in Docker

Commit image for deployment

Start the server

Query the server

Part 3: Deploy in Kubernetes

GCloud project login

Create a container cluster

Upload the Docker image

Create Kubernetes Deployment and Service

Query the model

Files

serving_kubernetes.md

Latest commit

History

serving_kubernetes.md

File metadata and controls

Use TensorFlow Serving with Kubernetes

Part 1: Setup

Download the ResNet SavedModel

Part 2: Running in Docker

Commit image for deployment

Start the server

Query the server

Part 3: Deploy in Kubernetes

GCloud project login

Create a container cluster

Upload the Docker image

Create Kubernetes Deployment and Service

Query the model