Summary:

This repo is a minimal example. It shows how to load balance 2 separate LLM services behind a Kong API Gateway. It is a less trivial implementation of the tutorial found in the Kong docs here.

Creating & Running the Load Balancer:

Run ollama on port 11434:

docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Run a second ollama service on host port 11435:

docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11435:11434 --name ollama_2 ollama/ollama

Start Kong inside a Docker container:

cd ./compose && KONG_DATABASE=postgres docker-compose --profile database up -d

Create an upstream named ai_upstream:

curl -X POST http://localhost:8001/upstreams --data name=ai_upstream

Create two targets for ai_upstream that point at the 2 ollama Docker containers you just created:

curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11434

curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11435

Create an ai_load_balancing_service service with ai_upstream as its host:

curl -i -s -X POST http://localhost:8001/services \
  --data name=ai_load_balancing_service \
  --data path=/api/chat \
  --data host='ai_upstream'

Add a route to the ai_load_balancing_service:

curl -i -X POST http://localhost:8001/services/ai_load_balancing_service/routes \
  --data 'paths[]=/api/chat' \
  --data name=ai_load_balancing_route

Follow your docker logs in order to track which of the 2 containers is being called:

docker logs -f CONTAINER_ID_HERE

Test your load balancer:

curl -X POST 'http://localhost:8000/api/chat' \
--header 'Content-Type: application/json' \
--data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "Why is the sky blue"} ], "model": "moondream", "stream": false }'

We use moondream model because it's the smallest model, and we were receiving OOM errors on this maintainer's local machine otherwise.

Other Commands:

What follows is a list of commands that, while not strictly necessary, were helpful when developing this load balancer.

Test Ollama Directly:

curl http://localhost:11434/api/chat -d '{ "model": "llama3.1", "messages": [ { "role": "user", "content": "why is the sky blue?" } ], "stream": false }'

Create & Test Trivial Kong Service:

Create service:

curl -i -s -X POST http://localhost:8001/services --data name=example_service --data url='http://httpbin.org'

Create route:

curl -i -X POST http://localhost:8001/services/example_service/routes --data 'paths[]=/mock' --data name=example_route

Test:

GET:

curl -X GET http://localhost:8000/mock/anything

POST:

curl -X POST http://localhost:8000/mock/anything --header 'Content-Type: application/json' --data '{ "messages": []}'

Create an LLM Service (not using Kong's AI Proxy Plugin):

Create ai_service service:

curl -i -s -X POST http://localhost:8001/services \
  --data name=ai_service \
  --data url='http://host.docker.internal:11434/api/chat'

Create route:

curl -i -X POST http://localhost:8001/services/ai_service/routes \
  --data 'paths[]=/api/chat' \
  --data name=ai_route

Test:

curl --location 'http://localhost:8000/api/chat' \
--header 'Content-Type: application/json' \
--data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ], "model": "llama3.1", "stream": false }'

Name	Name	Last commit message	Last commit date
Latest commit mepc36 Add note about moondream. Aug 21, 2024 c791544 · Aug 21, 2024 History 6 Commits
compose	compose	Add kong image.	Aug 21, 2024
.gitignore	.gitignore	Save commands for setting up ai_service in README.	Aug 21, 2024
README.md	README.md	Add note about moondream.	Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary:

Creating & Running the Load Balancer:

Other Commands:

Test Ollama Directly:

Create & Test Trivial Kong Service:

Create an LLM Service (not using Kong's AI Proxy Plugin):

About

Releases

Packages

Languages

mepc36/kong-load-balancer

Folders and files

Latest commit

History

Repository files navigation

Summary:

Creating & Running the Load Balancer:

Other Commands:

Test Ollama Directly:

Create & Test Trivial Kong Service:

Create an LLM Service (not using Kong's AI Proxy Plugin):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages