Overview

Using the RunPod platform to deploy a serverless endpoint running a machine learning model. The endpoint should be capable of receiving a request and generating a corresponding response.

runPod_case_study

RunPod is a powerful and cost-effective cloud platform designed for AI and machine learning workloads. It offers on-demand GPU instances, allowing users to access high-performance compute resources when needed. With its serverless computing capabilities, RunPod enables autoscaling API endpoints, making it easy to scale inference for AI models efficiently.

The platform provides AI endpoints tailored for inference workloads, ensuring seamless deployment and execution of machine learning models. Additionally, RunPod allows users to manage software on third-party compute resources while benefiting from dynamic scaling to meet computational demands effectively.

Objectives

Write a handler.py file that can handle serverless requests for the given model.
Build a Docker image that includes your serverless handler and the model.
Deploy a serverless endpoint on the RunPod platform.
Test that the serverless endpoint is able to receive an input (via text) and return an appropriate image for display

Solution

For ease of sharing the code and integrating with the RunPod platform, I created this GitHub repository to store my Dockerfile and handler file.
Created a Dockerfile that has pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime as the base image. The pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime Docker image provides a pre-built environment with PyTorch, CUDA, and cuDNN, optimized for running GPU-accelerated deep learning models like text-to-image. This simplifies setup and ensures compatibility for efficient model execution.

The dockerfile then installs the necessary python packages to be used for downloading the model and running the pipeline to generate the necessary image from text. The python package huggingface_hub is used to download the stabilityai/stable-diffusion-2-1-base, into a specified directory - app/model.

The handler file is also copied to the workspace folder - /app, and it is also set as the entry point.
The handler file picks up the downloaded model from the /app/model folder and uses it to prepare the pipeline using the diffusers package. The model is thereby loaded and made ready to use for any future requests to the handler.

The event function of the handler is invoked anytime an API request is made to the serverless function. It expects a request in the form:
```
  {
  "input": {
          "prompt": "Earth as a 3D cartoon animation"
      }
  }
```
The prompt is extracted by the event function and is used as an input to the pipeline to create images. Only the first image is considered, saved as png file and returned to the caller as a base64 string.
Deploying the serverless endpoint: For deploying, I have used the very handy option of connecting this GitHub repo to be the source for deploying to the RunPod platform. Any updates to the main branch will result in an automatic build. This will build from the Dockerfile, and the platform takes care of rolling out the updates incrementally to all workers.

In order to quickly test if the serverless endpoint is working or not, I use the Requests tab to trigger a request with a prompt and verify from the worker logs.

The Telemetry option is also quite handy to understand how GPU is consumed during the requests.

An example of a sample request and its response is shown below. You can see that output.body contains the base64 image data.

Finally, the Metrics tab shows an overview of the number of requests over various time frames and other necessary metrics like 'Execution Time'.

Testing the Endpoint: Now that I know that my serverless endpoint is up & running, I decided to create a quick React app in Github that will accept a text prompt, call this endpoint deployed to my RunPod platform and display the image based on the response.

-- The React App has a simple UI to accept a search prompt.

-- It uses it to trigger the image generation process via a POST request to the Serverless Endpoint.

-- It receives a Job ID as response, which it uses to poll for the image to be generated by the API as a Base64 image.

-- Once the API returns the image string, it decodes it to generate the image and displays it in the UI for the user

The React App is hosted at https://anamika8.github.io/runpod_frontend/. Below is a sample result:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Dockerfile		Dockerfile
README.md		README.md
my_handler.py		my_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

runPod_case_study

Objectives

Solution

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

anamikarunpod/runPod_case_study

Folders and files

Latest commit

History

Repository files navigation

Overview

runPod_case_study

Objectives

Solution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages