This sample project aims to show the possibility of deploying serverless generative AI on AWS at a very low cost.
The main idea is to deploy a container with an endpoint on a Lambda function to interact with the model.
You need to have installed:
- Docker;
- AWS CLI;
- Make.
Make sure you have created:
- an ECR repository.
Create the .env file from the .env.dist file and update it with:
ECR: the ECR registry;REPOSITORY: the ECR repository;MODEL_URL: the download url of a model in GGUF format (https://huggingface.co/models?library=gguf);
Note that the size of the model should be not a little less than the memory limit of the lambda, which is about 10 GB at most.
-
Build and push the image to the registry.
Download the model:
make download
Build the container image and tag it:
make build
make tag
Login into ECR:
make ecr-login
Push the image:
make push
-
Create a Lambda function with your repository.
Make sure to:
- set the maximum available memory;
- enable function URL;
- increase the timeout if necessary;
Make a request to the function endpoint to get the model response:
curl "https://{LAMBDA_FUNCTION_URL}/prompt?text=hello"