|
1 |
| -# COMMANDS: |
| 1 | +# Summary: |
2 | 2 |
|
3 |
| -## TEST OLLAMA DIRECTLY: |
| 3 | +This repo is a minimal example. It shows how to load balance 2 separate LLM services behind a Kong API Gateway. It is a less trivial implementation of the tutorial found in the Kong docs [here](https://docs.konghq.com/gateway/latest/get-started/load-balancing/). |
4 | 4 |
|
5 |
| -curl http://localhost:11434/api/chat -d '{ |
6 |
| - "model": "llama3.1", |
7 |
| - "messages": [ |
8 |
| - { "role": "user", "content": "why is the sky blue?" } |
9 |
| - ], |
10 |
| - "stream": false |
11 |
| -}' |
12 |
| - |
13 |
| -## CREATE & TEST SIMPLE SERVICE: |
| 5 | +## Creating & Running the Load Balancer: |
14 | 6 |
|
15 |
| -1. Create service: |
| 7 | +1. Run ollama on port 11434: |
16 | 8 |
|
17 | 9 | ```
|
18 |
| -curl -i -s -X POST http://localhost:8001/services --data name=example_service --data url='http://httpbin.org' |
| 10 | +docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama |
19 | 11 | ```
|
20 | 12 |
|
21 |
| -2. Create route: |
| 13 | +2. Run a second ollama service on host port 11435: |
22 | 14 |
|
23 | 15 | ```
|
24 |
| -curl -i -X POST http://localhost:8001/services/example_service/routes --data 'paths[]=/mock' --data name=example_route |
| 16 | +docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11435:11434 --name ollama_2 ollama/ollama |
| 17 | +
|
25 | 18 | ```
|
26 | 19 |
|
27 |
| -3. Test: |
| 20 | +3. Start Kong inside a Docker container: |
28 | 21 |
|
29 |
| -GET: |
| 22 | +``` |
| 23 | +cd ./compose && KONG_DATABASE=postgres docker-compose --profile database up -d |
| 24 | +``` |
| 25 | + |
| 26 | +4. Create an upstream named `ai_upstream`: |
30 | 27 |
|
31 | 28 | ```
|
32 |
| -curl -X GET http://localhost:8000/mock/anything |
| 29 | +curl -X POST http://localhost:8001/upstreams --data name=ai_upstream |
33 | 30 | ```
|
34 | 31 |
|
35 |
| -POST: |
| 32 | +5. Create two targets for `ai_upstream` that point at the 2 ollama Docker containers you just created: |
36 | 33 |
|
37 | 34 | ```
|
38 |
| -curl -X POST http://localhost:8000/mock/anything --header 'Content-Type: application/json' --data '{ "messages": []}' |
| 35 | +curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11434 |
39 | 36 | ```
|
40 | 37 |
|
41 |
| -## CREATE AN AI SERVICE NOT USING AI PROXY: |
| 38 | +``` |
| 39 | +curl -X POST http://localhost:8001/upstreams/ai_upstream/targets --data target=host.docker.internal:11435 |
| 40 | +``` |
42 | 41 |
|
43 |
| -1. Create `ai_service` service: |
| 42 | +6. Create an `ai_load_balancing_service` service with `ai_upstream` as its host: |
44 | 43 |
|
45 | 44 | ```
|
46 | 45 | curl -i -s -X POST http://localhost:8001/services \
|
47 |
| - --data name=ai_service \ |
48 |
| - --data url='http://host.docker.internal:11434/api/chat' |
| 46 | + --data name=ai_load_balancing_service \ |
| 47 | + --data path=/api/chat \ |
| 48 | + --data host='ai_upstream' |
49 | 49 | ```
|
50 | 50 |
|
51 |
| -2. Create route: |
| 51 | +7. Add a route to the `ai_load_balancing_service`: |
52 | 52 |
|
53 | 53 | ```
|
54 |
| -curl -i -X POST http://localhost:8001/services/ai_service/routes \ |
| 54 | +curl -i -X POST http://localhost:8001/services/ai_load_balancing_service/routes \ |
55 | 55 | --data 'paths[]=/api/chat' \
|
56 |
| - --data name=ai_route |
| 56 | + --data name=ai_load_balancing_route |
57 | 57 | ```
|
58 | 58 |
|
59 |
| -3. Test: |
| 59 | +8. Follow your docker logs in order to track which of the 2 containers is being called: |
60 | 60 |
|
61 | 61 | ```
|
62 |
| -curl --location 'http://localhost:8000/api/chat' \ |
| 62 | +docker logs -f CONTAINER_ID_HERE |
| 63 | +``` |
| 64 | + |
| 65 | +9. Test your load balancer: |
| 66 | + |
| 67 | +``` |
| 68 | +curl -X POST 'http://localhost:8000/api/chat' \ |
63 | 69 | --header 'Content-Type: application/json' \
|
64 | 70 | --data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ], "model": "llama3.1", "stream": false }'
|
65 | 71 | ```
|
66 | 72 |
|
67 |
| -## LOAD BALANCE LLMS USING AN UPSTREAM & 2 TARGETS: |
| 73 | +### Other Commands: |
68 | 74 |
|
69 |
| -1. Run ollama on port 11434: |
| 75 | +What follows is a list of commands that, while not strictly necessary, were helpful when developing this load balancer. |
| 76 | + |
| 77 | +## Test Ollama Directly: |
| 78 | + |
| 79 | +curl http://localhost:11434/api/chat -d '{ |
| 80 | + "model": "llama3.1", |
| 81 | + "messages": [ |
| 82 | + { "role": "user", "content": "why is the sky blue?" } |
| 83 | + ], |
| 84 | + "stream": false |
| 85 | +}' |
| 86 | + |
| 87 | +## Create & Test Trivial Kong Service: |
| 88 | + |
| 89 | +1. Create service: |
70 | 90 |
|
71 | 91 | ```
|
72 |
| -docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama |
| 92 | +curl -i -s -X POST http://localhost:8001/services --data name=example_service --data url='http://httpbin.org' |
73 | 93 | ```
|
74 | 94 |
|
75 |
| -2. Run a second ollama service on port 11435: |
| 95 | +2. Create route: |
76 | 96 |
|
77 | 97 | ```
|
78 |
| -docker run --network compose_kong-net -d -v ollama:/root/.ollama -p 11435:11435 --name ollama_2 ollama/ollama |
79 |
| -
|
| 98 | +curl -i -X POST http://localhost:8001/services/example_service/routes --data 'paths[]=/mock' --data name=example_route |
80 | 99 | ```
|
81 | 100 |
|
82 |
| -3. Start Kong inside a Docker container: |
| 101 | +3. Test: |
| 102 | + |
| 103 | +GET: |
83 | 104 |
|
84 | 105 | ```
|
85 |
| -cd ./compose && KONG_DATABASE=postgres docker-compose --profile database up -d |
| 106 | +curl -X GET http://localhost:8000/mock/anything |
86 | 107 | ```
|
87 | 108 |
|
88 |
| -4. Create an `ai_service_2` service with an `ai_upstream` host: |
| 109 | +POST: |
89 | 110 |
|
90 | 111 | ```
|
91 |
| -curl -i -s -X POST http://localhost:8001/services \ |
92 |
| - --data name=ai_service_2 \ |
93 |
| - --data host='ai_upstream' |
| 112 | +curl -X POST http://localhost:8000/mock/anything --header 'Content-Type: application/json' --data '{ "messages": []}' |
94 | 113 | ```
|
95 | 114 |
|
96 |
| -5. Create two targets for `ai_upstream`: |
| 115 | +## Create an LLM Service (not using Kong's AI Proxy Plugin): |
| 116 | + |
| 117 | +1. Create `ai_service` service: |
97 | 118 |
|
98 | 119 | ```
|
99 |
| -curl -X POST http://localhost:8001/upstreams/example_upstream/targets \ |
100 |
| - --data target=host.docker.internal:11434' && \ |
101 |
| -curl -X POST http://localhost:8001/upstreams/example_upstream/targets \ |
102 |
| - --data target=host.docker.internal:11435' |
| 120 | +curl -i -s -X POST http://localhost:8001/services \ |
| 121 | + --data name=ai_service \ |
| 122 | + --data url='http://host.docker.internal:11434/api/chat' |
103 | 123 | ```
|
104 | 124 |
|
105 |
| -6. Add route: |
| 125 | +2. Create route: |
106 | 126 |
|
107 | 127 | ```
|
108 |
| -curl -i -X POST http://localhost:8001/services/ai_service_2/routes \ |
| 128 | +curl -i -X POST http://localhost:8001/services/ai_service/routes \ |
109 | 129 | --data 'paths[]=/api/chat' \
|
110 |
| - --data name=ai_load_balancing_route |
| 130 | + --data name=ai_route |
111 | 131 | ```
|
112 | 132 |
|
113 |
| -7. Test ollama via Kong: |
| 133 | +3. Test: |
114 | 134 |
|
115 | 135 | ```
|
116 |
| -curl -X POST 'http://localhost:8000/api/chat' \ |
| 136 | +curl --location 'http://localhost:8000/api/chat' \ |
117 | 137 | --header 'Content-Type: application/json' \
|
118 | 138 | --data '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ], "model": "llama3.1", "stream": false }'
|
119 | 139 | ```
|
0 commit comments