Skip to content

Commit 3a3f765

Browse files
authored
[Docs] Add PrisKV & AIBrix integration example (#17)
1 parent 1d82362 commit 3a3f765

File tree

3 files changed

+466
-0
lines changed

3 files changed

+466
-0
lines changed

samples/cluster/README.md

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
# PrisKV KVCache Cluster × AIBrix Integration Example
2+
3+
This README describes how to deploy:
4+
5+
- **KVCache Controller** (Kubernetes control plane for KVCache clusters)
6+
- A **PrisKV-based KVCache cluster** (HPKV data nodes + Redis-compatible metadata + Watcher)
7+
- An **AIBrix-enabled vLLM inference service** that uses PrisKV as a remote KV cache backend
8+
9+
This example is intended as a step-by-step tutorial you can run on any compatible Kubernetes cluster (e.g., Volcengine VKE) with GPU nodes.
10+
11+
---
12+
13+
## 1. Architecture Overview
14+
15+
This example sets up the following components:
16+
17+
1. **KVCache Controller**
18+
- Watches `KVCache` custom resources
19+
- Creates and manages:
20+
- Redis-compatible metadata service
21+
- Watcher Pod (for node discovery and registration)
22+
- PrisKV data nodes
23+
24+
2. **PrisKV KVCache Cluster**
25+
- Distributed KV cache backend
26+
- Uses Redis-compatible metadata and PrisKV data nodes for KV storage
27+
28+
3. **AIBrix-enabled vLLM Service**
29+
- vLLM image extended with:
30+
- AIBrix KVCache Offloading connector
31+
- PrisKV client SDK
32+
- Uses `AIBrixOffloadingConnectorV1Type3` to offload KV tensors to PrisKV
33+
- Treats PrisKV as an L2 cache backend (L1 DRAM cache is optional)
34+
35+
---
36+
37+
## 2. Prerequisites
38+
39+
### 2.1 Kubernetes Cluster with GPUs
40+
41+
You need a Kubernetes cluster with GPU nodes. For example, on Volcengine VKE:
42+
43+
- Create a VKE cluster with GPU instances such as **H20** / **A800** that support RDMA.
44+
- Official docs (examples):
45+
- Cluster creation: `https://www.volcengine.com/docs/6460/100936?LibVersion=2.27.0`
46+
- `kubectl` configuration: `https://www.volcengine.com/docs/6460/1374028-`
47+
48+
Other Kubernetes providers are also fine as long as:
49+
50+
- GPU nodes are available
51+
- Networking is sufficient for PrisKV and inference pods (RDMA recommended but not strictly required for functional validation)
52+
53+
### 2.2 kubectl Access
54+
55+
Make sure `kubectl` can talk to the cluster:
56+
57+
```bash
58+
kubectl get nodes
59+
```
60+
61+
You should see your GPU nodes listed.
62+
63+
---
64+
65+
## 3. Install AIBrix
66+
67+
The KVCache Controller is part of AIBrix and it runs in its own namespace and reconciles `KVCache` CRs.
68+
69+
```bash
70+
# Install envoy-gateway, this is not aibrix component. you can also use helm package to install it.
71+
helm install eg oci://docker.io/envoyproxy/gateway-helm --version v1.2.8 -n envoy-gateway-system --create-namespace
72+
73+
# patch the configuration to enable EnvoyPatchPolicy, this is super important!
74+
kubectl apply -f - <<EOF
75+
apiVersion: v1
76+
kind: ConfigMap
77+
metadata:
78+
name: envoy-gateway-config
79+
namespace: envoy-gateway-system
80+
data:
81+
envoy-gateway.yaml: |
82+
apiVersion: gateway.envoyproxy.io/v1alpha1
83+
kind: EnvoyGateway
84+
provider:
85+
type: Kubernetes
86+
gateway:
87+
controllerName: gateway.envoyproxy.io/gatewayclass-controller
88+
extensionApis:
89+
enableEnvoyPatchPolicy: true
90+
EOF
91+
```
92+
93+
```bash
94+
# Install AIBrix CRDs. `--install-crds` is not available in local chart installation.
95+
kubectl apply -f dist/chart/crds/
96+
97+
# Install AIBrix with the pinned release version:
98+
helm install aibrix dist/chart -f dist/chart/stable.yaml -n aibrix-system --create-namespace
99+
```
100+
101+
> At the moment, the controller is assumed to run in `aibrix-system`. Future versions may support custom namespaces.
102+
103+
Verify the controller is running:
104+
105+
```bash
106+
kubectl get pods -n aibrix-system
107+
kubectl get deployments -n aibrix-system
108+
```
109+
110+
You should see controller-related Pods in `Running` state.
111+
112+
---
113+
114+
## 4. Deploy a PrisKV KVCache Cluster
115+
116+
With the controller up, define a `KVCache` custom resource and let the controller create the cluster.
117+
118+
### 4.1 Apply KVCache CR
119+
120+
```bash
121+
kubectl apply -f kvcache.yaml
122+
```
123+
124+
After the controller reconciles the resource, you should see pods similar to:
125+
126+
```bash
127+
kubectl get pods
128+
NAME READY STATUS RESTARTS AGE
129+
debug 1/1 Running 0 23h
130+
kvcache-cluster-0 1/1 Running 0 8h
131+
kvcache-cluster-1 1/1 Running 0 8h
132+
kvcache-cluster-2 1/1 Running 0 8h
133+
kvcache-cluster-kvcache-watcher-pod 1/1 Running 0 8h
134+
kvcache-cluster-redis 1/1 Running 0 8h
135+
```
136+
137+
Roles:
138+
139+
- `kvcache-cluster-0/1/2` – PrisKV data nodes
140+
- `kvcache-cluster-redis` – Redis-compatible metadata service
141+
- `kvcache-cluster-kvcache-watcher-pod` – Watcher that discovers and registers data nodes into metadata
142+
143+
### 4.2 Verify Redis Metadata
144+
145+
To confirm the cluster is writing metadata correctly:
146+
147+
```bash
148+
kubectl exec -it kvcache-cluster-redis -- bash
149+
```
150+
151+
Inside the container:
152+
153+
```bash
154+
redis-cli -a kvcache_nodes
155+
KEYS *
156+
# Inspect keys according to your schema
157+
```
158+
159+
If you can see keys representing nodes, shards or sessions, the KVCache cluster is healthy.
160+
161+
---
162+
163+
## 5. Deploy an AIBrix-Enabled vLLM Service Using PrisKV
164+
165+
Next, we deploy a vLLM service that:
166+
167+
- Uses the **AIBrix KV Offloading connector** (`AIBrixOffloadingConnectorV1Type3`)
168+
- Configures **PrisKV** as the L2 KV cache backend (through Redis metadata)
169+
170+
171+
```yaml
172+
kubectl apply -f vllm.yaml
173+
```
174+
175+
176+
---
177+
178+
## 6. End-to-End Validation
179+
180+
### 6.1 Check All Pods
181+
182+
```bash
183+
kubectl get pods
184+
```
185+
186+
You should see:
187+
188+
- `kvcache-cluster-*` pods from the PrisKV cluster
189+
- `deepseek-r1-distill-llama-8b-*` inference pod
190+
191+
All should be `Running`.
192+
193+
### 6.2 Send Test Requests
194+
195+
Port-forward the service:
196+
197+
```bash
198+
kubectl port-forward svc/deepseek-r1-distill-llama-8b 8000:8000
199+
```
200+
201+
Then call the OpenAI-compatible endpoint:
202+
203+
```bash
204+
curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
205+
"model": "deepseek-r1-distill-llama-8b",
206+
"messages": [
207+
{"role": "user", "content": "Hello, PrisKV and AIBrix!"}
208+
]
209+
}'
210+
```
211+
212+
These requests will generate KVCache traffic.
213+
214+
### 6.3 Check Redis Metadata
215+
216+
Inspect Redis again:
217+
218+
```bash
219+
kubectl exec -it kvcache-cluster-redis -- bash
220+
redis-cli -a kvcache_nodes
221+
222+
KEYS *
223+
# Inspect keys to confirm entries related to sessions / nodes / chunks have been created
224+
```
225+
226+
If new keys appear after you send requests, it means:
227+
228+
> The AIBrix-enabled vLLM instance is successfully using the PrisKV cluster as its remote KV cache backend.
229+
230+
231+
## 7. Next Steps
232+
233+
From here, you can extend this example to:
234+
235+
- Share a single PrisKV cluster across multiple engines (vLLM, SGLang, etc.).
236+
- Combine L1 DRAM cache with L2 PrisKV (multi-tier KV caching).
237+
- Run benchmarks to evaluate performance and cost efficiency under real workloads.
238+
- Integrate with your existing autoscaling and routing stack for production use.
239+
240+
If you have feedback or want to contribute improvements to the controller, cluster layout, or AIBrix integration, feel free to open an issue or pull request in this repository.

samples/cluster/kvcache.yaml

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
apiVersion: orchestration.aibrix.ai/v1alpha1
2+
kind: KVCache
3+
metadata:
4+
name: kvcache-cluster
5+
namespace: default
6+
annotations:
7+
kvcache.orchestration.aibrix.ai/backend: hpkv
8+
hpkv.kvcache.orchestration.aibrix.ai/rdma-port: "18512"
9+
hpkv.kvcache.orchestration.aibrix.ai/admin-port: "9100"
10+
hpkv.kvcache.orchestration.aibrix.ai/block-size-bytes: "4096"
11+
hpkv.kvcache.orchestration.aibrix.ai/block-count: "1048576"
12+
hpkv.kvcache.orchestration.aibrix.ai/total-slots: "4096"
13+
hpkv.kvcache.orchestration.aibrix.ai/virtual-node-count: "100"
14+
spec:
15+
metadata:
16+
redis:
17+
runtime:
18+
image: kvcache-image-container-cn-shanghai.cr.volces.com/kvcache/redis:7.4.2
19+
replicas: 1
20+
resources:
21+
requests:
22+
cpu: 1000m
23+
memory: 1Gi
24+
limits:
25+
cpu: 1000m
26+
memory: 1Gi
27+
service:
28+
type: ClusterIP
29+
ports:
30+
- name: service
31+
port: 18512
32+
targetPort: 18512
33+
protocol: TCP
34+
- name: admin
35+
port: 9100
36+
targetPort: 9100
37+
protocol: TCP
38+
watcher:
39+
image: kvcache-image-container-cn-shanghai.cr.volces.com/kvcache/kvcache-watcher:nightly
40+
imagePullPolicy: Always
41+
resources:
42+
requests:
43+
cpu: "200m"
44+
memory: "256Mi"
45+
vke.volcengine.com/rdma: "1"
46+
limits:
47+
cpu: "200m"
48+
memory: "256Mi"
49+
vke.volcengine.com/rdma: "1"
50+
cache:
51+
replicas: 3
52+
template:
53+
metadata:
54+
annotations:
55+
prometheus.io/path: /metrics
56+
prometheus.io/port: "2112"
57+
prometheus.io/scrape: "true"
58+
k8s.volcengine.com/pod-networks: |
59+
[
60+
{
61+
"cniConf":{
62+
"name":"rdma"
63+
}
64+
}
65+
]
66+
spec:
67+
hostIPC: true
68+
containers:
69+
- name: kvcache-server
70+
image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/priskv:v0.0.2
71+
command:
72+
- "/bin/bash"
73+
- "-c"
74+
args:
75+
- |
76+
AIBRIX_KVCACHE_RDMA_IP=$(ip addr show dev eth1 | grep 'inet ' | awk '{print $2}' | awk -F/ '{print $1}')
77+
echo "Binding to RDMA IP: $AIBRIX_KVCACHE_RDMA_IP"
78+
./hpkv-server -a $AIBRIX_KVCACHE_RDMA_IP -p 18512 -v 4096 -b 1048576 --acl any -A $AIBRIX_KVCACHE_RDMA_IP -P 9100
79+
ports:
80+
- name: service
81+
containerPort: 18512
82+
protocol: TCP
83+
- name: manage
84+
containerPort: 9100
85+
protocol: TCP
86+
env:
87+
- name: AIBRIX_KVCACHE_UID
88+
valueFrom:
89+
fieldRef:
90+
fieldPath: metadata.uid
91+
- name: AIBRIX_KVCACHE_NAME
92+
valueFrom:
93+
fieldRef:
94+
fieldPath: metadata.name
95+
- name: AIBRIX_KVCACHE_NAMESPACE
96+
valueFrom:
97+
fieldRef:
98+
fieldPath: metadata.namespace
99+
- name: AIBRIX_KVCACHE_RDMA_PORT
100+
value: "18512"
101+
- name: AIBRIX_KVCACHE_ADMIN_PORT
102+
value: "9100"
103+
- name: AIBRIX_KVCACHE_BLOCK_SIZE_IN_BYTES
104+
value: "4096"
105+
- name: AIBRIX_KVCACHE_BLOCK_COUNT
106+
value: "1048576"
107+
securityContext:
108+
capabilities:
109+
add:
110+
- IPC_LOCK
111+
- SYS_RESOURCE
112+
resources:
113+
requests:
114+
cpu: "6000m"
115+
memory: "30Gi"
116+
vke.volcengine.com/rdma: "1"
117+
limits:
118+
cpu: "6000m"
119+
memory: "30Gi"
120+
vke.volcengine.com/rdma: "1"

0 commit comments

Comments
 (0)