[Docs] Add PrisKV & AIBrix integration example (#17)

Jeffwan · web-flow · commit 3a3f7657b84b · 2025-11-25T23:45:42.000-08:00
diff --git a/samples/cluster/README.md b/samples/cluster/README.md
@@ -0,0 +1,240 @@
+# PrisKV KVCache Cluster × AIBrix Integration Example
+
+This README describes how to deploy:
+
+- **KVCache Controller** (Kubernetes control plane for KVCache clusters)
+- A **PrisKV-based KVCache cluster** (HPKV data nodes + Redis-compatible metadata + Watcher)
+- An **AIBrix-enabled vLLM inference service** that uses PrisKV as a remote KV cache backend
+
+This example is intended as a step-by-step tutorial you can run on any compatible Kubernetes cluster (e.g., Volcengine VKE) with GPU nodes.
+
+---
+
+## 1. Architecture Overview
+
+This example sets up the following components:
+
+1. **KVCache Controller**  
+   - Watches `KVCache` custom resources  
+   - Creates and manages:
+     - Redis-compatible metadata service  
+     - Watcher Pod (for node discovery and registration)  
+     - PrisKV data nodes
+
+2. **PrisKV KVCache Cluster**  
+   - Distributed KV cache backend
+   - Uses Redis-compatible metadata and PrisKV data nodes for KV storage
+
+3. **AIBrix-enabled vLLM Service**  
+   - vLLM image extended with:
+     - AIBrix KVCache Offloading connector  
+     - PrisKV client SDK  
+   - Uses `AIBrixOffloadingConnectorV1Type3` to offload KV tensors to PrisKV
+   - Treats PrisKV as an L2 cache backend (L1 DRAM cache is optional)
+
+---
+
+## 2. Prerequisites
+
+### 2.1 Kubernetes Cluster with GPUs
+
+You need a Kubernetes cluster with GPU nodes. For example, on Volcengine VKE:
+
+- Create a VKE cluster with GPU instances such as **H20** / **A800** that support RDMA.
+- Official docs (examples):
+  - Cluster creation: `https://www.volcengine.com/docs/6460/100936?LibVersion=2.27.0`
+  - `kubectl` configuration: `https://www.volcengine.com/docs/6460/1374028-`
+
+Other Kubernetes providers are also fine as long as:
+
+- GPU nodes are available
+- Networking is sufficient for PrisKV and inference pods (RDMA recommended but not strictly required for functional validation)
+
+### 2.2 kubectl Access
+
+Make sure `kubectl` can talk to the cluster:
+
+```bash
+kubectl get nodes
+```
+
+You should see your GPU nodes listed.
+
+---
+
+## 3. Install AIBrix
+
+The KVCache Controller is part of AIBrix and it runs in its own namespace and reconciles `KVCache` CRs.
+
+```bash
+# Install envoy-gateway, this is not aibrix component. you can also use helm package to install it.
+helm install eg oci://docker.io/envoyproxy/gateway-helm --version v1.2.8 -n envoy-gateway-system --create-namespace
+
+# patch the configuration to enable EnvoyPatchPolicy, this is super important!
+kubectl apply -f - <<EOF
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: envoy-gateway-config
+  namespace: envoy-gateway-system
+data:
+  envoy-gateway.yaml: |
+    apiVersion: gateway.envoyproxy.io/v1alpha1
+    kind: EnvoyGateway
+    provider:
+      type: Kubernetes
+    gateway:
+      controllerName: gateway.envoyproxy.io/gatewayclass-controller
+    extensionApis:
+      enableEnvoyPatchPolicy: true
+EOF
+```
+
+```bash
+# Install AIBrix CRDs. `--install-crds` is not available in local chart installation.
+kubectl apply -f dist/chart/crds/
+
+# Install AIBrix with the pinned release version:
+helm install aibrix dist/chart -f dist/chart/stable.yaml -n aibrix-system --create-namespace
+```
+
+> At the moment, the controller is assumed to run in `aibrix-system`. Future versions may support custom namespaces.
+
+Verify the controller is running:
+
+```bash
+kubectl get pods -n aibrix-system
+kubectl get deployments -n aibrix-system
+```
+
+You should see controller-related Pods in `Running` state.
+
+---
+
+## 4. Deploy a PrisKV KVCache Cluster
+
+With the controller up, define a `KVCache` custom resource and let the controller create the cluster.
+
+### 4.1 Apply KVCache CR
+
+```bash
+kubectl apply -f kvcache.yaml
+```
+
+After the controller reconciles the resource, you should see pods similar to:
+
+```bash
+kubectl get pods
+NAME                                  READY   STATUS    RESTARTS   AGE
+debug                                 1/1     Running   0          23h
+kvcache-cluster-0                     1/1     Running   0          8h
+kvcache-cluster-1                     1/1     Running   0          8h
+kvcache-cluster-2                     1/1     Running   0          8h
+kvcache-cluster-kvcache-watcher-pod   1/1     Running   0          8h
+kvcache-cluster-redis                 1/1     Running   0          8h
+```
+
+Roles:
+
+- `kvcache-cluster-0/1/2` – PrisKV  data nodes  
+- `kvcache-cluster-redis` – Redis-compatible metadata service  
+- `kvcache-cluster-kvcache-watcher-pod` – Watcher that discovers and registers data nodes into metadata
+
+### 4.2 Verify Redis Metadata
+
+To confirm the cluster is writing metadata correctly:
+
+```bash
+kubectl exec -it kvcache-cluster-redis -- bash
+```
+
+Inside the container:
+
+```bash
+redis-cli -a kvcache_nodes
+KEYS *
+# Inspect keys according to your schema
+```
+
+If you can see keys representing nodes, shards or sessions, the KVCache cluster is healthy.
+
+---
+
+## 5. Deploy an AIBrix-Enabled vLLM Service Using PrisKV
+
+Next, we deploy a vLLM service that:
+
+- Uses the **AIBrix KV Offloading connector** (`AIBrixOffloadingConnectorV1Type3`)  
+- Configures **PrisKV** as the L2 KV cache backend (through Redis metadata)
+
+
+```yaml
+kubectl apply -f vllm.yaml
+```
+
+
+---
+
+## 6. End-to-End Validation
+
+### 6.1 Check All Pods
+
+```bash
+kubectl get pods
+```
+
+You should see:
+
+- `kvcache-cluster-*` pods from the PrisKV cluster
+- `deepseek-r1-distill-llama-8b-*` inference pod
+
+All should be `Running`.
+
+### 6.2 Send Test Requests
+
+Port-forward the service:
+
+```bash
+kubectl port-forward svc/deepseek-r1-distill-llama-8b 8000:8000
+```
+
+Then call the OpenAI-compatible endpoint:
+
+```bash
+curl http://127.0.0.1:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
+    "model": "deepseek-r1-distill-llama-8b",
+    "messages": [
+      {"role": "user", "content": "Hello, PrisKV and AIBrix!"}
+    ]
+  }'
+```
+
+These requests will generate KVCache traffic.
+
+### 6.3 Check Redis Metadata
+
+Inspect Redis again:
+
+```bash
+kubectl exec -it kvcache-cluster-redis -- bash
+redis-cli -a kvcache_nodes
+
+KEYS *
+# Inspect keys to confirm entries related to sessions / nodes / chunks have been created
+```
+
+If new keys appear after you send requests, it means:
+
+> The AIBrix-enabled vLLM instance is successfully using the PrisKV cluster as its remote KV cache backend.
+
+
+## 7. Next Steps
+
+From here, you can extend this example to:
+
+- Share a single PrisKV cluster across multiple engines (vLLM, SGLang, etc.).
+- Combine L1 DRAM cache with L2 PrisKV (multi-tier KV caching).
+- Run benchmarks to evaluate performance and cost efficiency under real workloads.
+- Integrate with your existing autoscaling and routing stack for production use.
+
+If you have feedback or want to contribute improvements to the controller, cluster layout, or AIBrix integration, feel free to open an issue or pull request in this repository.
diff --git a/samples/cluster/kvcache.yaml b/samples/cluster/kvcache.yaml
@@ -0,0 +1,120 @@
+apiVersion: orchestration.aibrix.ai/v1alpha1
+kind: KVCache
+metadata:
+  name: kvcache-cluster
+  namespace: default
+  annotations:
+    kvcache.orchestration.aibrix.ai/backend: hpkv
+    hpkv.kvcache.orchestration.aibrix.ai/rdma-port: "18512"
+    hpkv.kvcache.orchestration.aibrix.ai/admin-port: "9100"
+    hpkv.kvcache.orchestration.aibrix.ai/block-size-bytes: "4096"
+    hpkv.kvcache.orchestration.aibrix.ai/block-count: "1048576"
+    hpkv.kvcache.orchestration.aibrix.ai/total-slots: "4096"
+    hpkv.kvcache.orchestration.aibrix.ai/virtual-node-count: "100"
+spec:
+  metadata:
+    redis:
+      runtime:
+        image: kvcache-image-container-cn-shanghai.cr.volces.com/kvcache/redis:7.4.2
+        replicas: 1
+        resources:
+          requests:
+            cpu: 1000m
+            memory: 1Gi
+          limits:
+            cpu: 1000m
+            memory: 1Gi
+  service:
+    type: ClusterIP
+    ports:
+      - name: service
+        port: 18512
+        targetPort: 18512
+        protocol: TCP
+      - name: admin
+        port: 9100
+        targetPort: 9100
+        protocol: TCP
+  watcher:
+    image: kvcache-image-container-cn-shanghai.cr.volces.com/kvcache/kvcache-watcher:nightly
+    imagePullPolicy: Always
+    resources:
+      requests:
+        cpu: "200m"
+        memory: "256Mi"
+        vke.volcengine.com/rdma: "1"
+      limits:
+        cpu: "200m"
+        memory: "256Mi"
+        vke.volcengine.com/rdma: "1"
+  cache:
+    replicas: 3
+    template:
+      metadata:
+        annotations:
+          prometheus.io/path: /metrics
+          prometheus.io/port: "2112"
+          prometheus.io/scrape: "true"
+          k8s.volcengine.com/pod-networks: |
+            [
+              {
+                "cniConf":{
+                    "name":"rdma"
+                }
+              }
+            ]
+      spec:
+        hostIPC: true
+        containers:
+          - name: kvcache-server
+            image: kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/priskv:v0.0.2
+            command:
+              - "/bin/bash"
+              - "-c"
+            args:
+              - |
+                AIBRIX_KVCACHE_RDMA_IP=$(ip addr show dev eth1 | grep 'inet ' | awk '{print $2}' | awk -F/ '{print $1}')
+                echo "Binding to RDMA IP: $AIBRIX_KVCACHE_RDMA_IP"
+                ./hpkv-server -a $AIBRIX_KVCACHE_RDMA_IP -p 18512 -v 4096 -b 1048576 --acl any -A $AIBRIX_KVCACHE_RDMA_IP -P 9100
+            ports:
+              - name: service
+                containerPort: 18512
+                protocol: TCP
+              - name: manage
+                containerPort: 9100
+                protocol: TCP
+            env:
+              - name: AIBRIX_KVCACHE_UID
+                valueFrom:
+                  fieldRef:
+                    fieldPath: metadata.uid
+              - name: AIBRIX_KVCACHE_NAME
+                valueFrom:
+                  fieldRef:
+                    fieldPath: metadata.name
+              - name: AIBRIX_KVCACHE_NAMESPACE
+                valueFrom:
+                  fieldRef:
+                    fieldPath: metadata.namespace
+              - name: AIBRIX_KVCACHE_RDMA_PORT
+                value: "18512"
+              - name: AIBRIX_KVCACHE_ADMIN_PORT
+                value: "9100"
+              - name: AIBRIX_KVCACHE_BLOCK_SIZE_IN_BYTES
+                value: "4096"
+              - name: AIBRIX_KVCACHE_BLOCK_COUNT
+                value: "1048576"
+            securityContext:
+              capabilities:
+                add:
+                  - IPC_LOCK
+                  - SYS_RESOURCE
+            resources:
+              requests:
+                cpu: "6000m"
+                memory: "30Gi"
+                vke.volcengine.com/rdma: "1"
+              limits:
+                cpu: "6000m"
+                memory: "30Gi"
+                vke.volcengine.com/rdma: "1"
diff --git a/samples/cluster/vllm.yaml b/samples/cluster/vllm.yaml