Skip to content

Commit 3d22ef1

Browse files
committed
Add successful commands to README.
1 parent c35beab commit 3d22ef1

File tree

1 file changed

+71
-51
lines changed

1 file changed

+71
-51
lines changed

README.md

+71-51
Original file line numberDiff line numberDiff line change
@@ -1,119 +1,139 @@
1-
# COMMANDS:
1+
# Summary:
22

3-
## TEST OLLAMA DIRECTLY:
3+
This repo is a minimal example. It shows how to load balance 2 separate LLM services behind a Kong API Gateway. It is a less trivial implementation of the tutorial found in the Kong docs [here](https://docs.konghq.com/gateway/latest/get-started/load-balancing/).
44

5-
curl http://localhost:11434/api/chat -d '{
6-
"model": "llama3.1",
7-
"messages": [
8-
{ "role": "user", "content": "why is the sky blue?" }
9-
],
10-
"stream": false
11-
}'
12-
13-
## CREATE & TEST SIMPLE SERVICE:
5+
## Creating & Running the Load Balancer:
146

15-
1. Create service:
7+
1. Run ollama on port 11434:
168

179
```
18-
curl -i -s -X POST http://localhost:8001/services --data name=example_service --data url='http://httpbin.org'
10+
docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
1911
```
2012

21-
2. Create route:
13+
2. Run a second ollama service on host port 11435:
2214

2315
```
24-
curl -i -X POST http://localhost:8001/services/example_service/routes --data 'paths[]=/mock' --data name=example_route
16+
docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11435:11434 --name ollama_2 ollama/ollama
17+
2518
```
2619

27-
3. Test:
20+
3. Start Kong inside a Docker container:
2821

29-
GET:
22+
```
23+
cd ./compose && KONG_DATABASE=postgres docker-compose --profile database up -d
24+
```
25+
26+
4. Create an upstream named `ai_upstream`:
3027

3128
```
32-
curl -X GET http://localhost:8000/mock/anything
29+
curl -X POST http://localhost:8001/upstreams --data name=ai_upstream
3330
```
3431

35-
POST:
32+
5. Create two targets for `ai_upstream` that point at the 2 ollama Docker containers you just created:
3633

3734
```
38-
curl -X POST http://localhost:8000/mock/anything --header 'Content-Type: application/json' --data '{ "messages": []}'
35+
curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11434
3936
```
4037

41-
## CREATE AN AI SERVICE NOT USING AI PROXY:
38+
```
39+
curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11435
40+
```
4241

43-
1. Create `ai_service` service:
42+
6. Create an `ai_load_balancing_service` service with `ai_upstream` as its host:
4443

4544
```
4645
curl -i -s -X POST http://localhost:8001/services \
47-
--data name=ai_service \
48-
--data url='http://host.docker.internal:11434/api/chat'
46+
--data name=ai_load_balancing_service \
47+
--data path=/api/chat \
48+
--data host='ai_upstream'
4949
```
5050

51-
2. Create route:
51+
7. Add a route to the `ai_load_balancing_service`:
5252

5353
```
54-
curl -i -X POST http://localhost:8001/services/ai_service/routes \
54+
curl -i -X POST http://localhost:8001/services/ai_load_balancing_service/routes \
5555
--data 'paths[]=/api/chat' \
56-
--data name=ai_route
56+
--data name=ai_load_balancing_route
5757
```
5858

59-
3. Test:
59+
8. Follow your docker logs in order to track which of the 2 containers is being called:
6060

6161
```
62-
curl --location 'http://localhost:8000/api/chat' \
62+
docker logs -f CONTAINER_ID_HERE
63+
```
64+
65+
9. Test your load balancer:
66+
67+
```
68+
curl -X POST 'http://localhost:8000/api/chat' \
6369
--header 'Content-Type: application/json' \
6470
--data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ], "model": "llama3.1", "stream": false }'
6571
```
6672

67-
## LOAD BALANCE LLMS USING AN UPSTREAM & 2 TARGETS:
73+
### Other Commands:
6874

69-
1. Run ollama on port 11434:
75+
What follows is a list of commands that, while not strictly necessary, were helpful when developing this load balancer.
76+
77+
## Test Ollama Directly:
78+
79+
curl http://localhost:11434/api/chat -d '{
80+
"model": "llama3.1",
81+
"messages": [
82+
{ "role": "user", "content": "why is the sky blue?" }
83+
],
84+
"stream": false
85+
}'
86+
87+
## Create & Test Trivial Kong Service:
88+
89+
1. Create service:
7090

7191
```
72-
docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
92+
curl -i -s -X POST http://localhost:8001/services --data name=example_service --data url='http://httpbin.org'
7393
```
7494

75-
2. Run a second ollama service on port 11435:
95+
2. Create route:
7696

7797
```
78-
docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11435:11435 --name ollama_2 ollama/ollama
79-
98+
curl -i -X POST http://localhost:8001/services/example_service/routes --data 'paths[]=/mock' --data name=example_route
8099
```
81100

82-
3. Start Kong inside a Docker container:
101+
3. Test:
102+
103+
GET:
83104

84105
```
85-
cd ./compose && KONG_DATABASE=postgres docker-compose --profile database up -d
106+
curl -X GET http://localhost:8000/mock/anything
86107
```
87108

88-
4. Create an `ai_service_2` service with an `ai_upstream` host:
109+
POST:
89110

90111
```
91-
curl -i -s -X POST http://localhost:8001/services \
92-
--data name=ai_service_2 \
93-
--data host='ai_upstream'
112+
curl -X POST http://localhost:8000/mock/anything --header 'Content-Type: application/json' --data '{ "messages": []}'
94113
```
95114

96-
5. Create two targets for `ai_upstream`:
115+
## Create an LLM Service (not using Kong's AI Proxy Plugin):
116+
117+
1. Create `ai_service` service:
97118

98119
```
99-
curl -X POST http://localhost:8001/upstreams/example_upstream/targets \
100-
--data target=host.docker.internal:11434' && \
101-
curl -X POST http://localhost:8001/upstreams/example_upstream/targets \
102-
--data target=host.docker.internal:11435'
120+
curl -i -s -X POST http://localhost:8001/services \
121+
--data name=ai_service \
122+
--data url='http://host.docker.internal:11434/api/chat'
103123
```
104124

105-
6. Add route:
125+
2. Create route:
106126

107127
```
108-
curl -i -X POST http://localhost:8001/services/ai_service_2/routes \
128+
curl -i -X POST http://localhost:8001/services/ai_service/routes \
109129
--data 'paths[]=/api/chat' \
110-
--data name=ai_load_balancing_route
130+
--data name=ai_route
111131
```
112132

113-
7. Test ollama via Kong:
133+
3. Test:
114134

115135
```
116-
curl -X POST 'http://localhost:8000/api/chat' \
136+
curl --location 'http://localhost:8000/api/chat' \
117137
--header 'Content-Type: application/json' \
118138
--data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ], "model": "llama3.1", "stream": false }'
119139
```

0 commit comments

Comments
 (0)