Skip to content

Commit 1d82362

Browse files
DwyaneShiHaiyang Shi
andauthored
[Doc] kvcache offloading samples (#14)
Signed-off-by: Haiyang Shi <[email protected]> Co-authored-by: Haiyang Shi <[email protected]>
1 parent ac55e6c commit 1d82362

File tree

4 files changed

+560
-0
lines changed

4 files changed

+560
-0
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
## KVCache Offloading with PrisKV — Quickstart
2+
3+
This guide helps you launch a complete KVCache offloading setup with PrisKV, an inference engine (vLLM or SGLang), and a benchmark container using Docker Compose.
4+
5+
What you’ll do:
6+
- Prepare the PrisKV server image (prebuilt or build from source)
7+
- Prepare an inference engine image (vLLM or SGLang) that already has aibrix_kvcache
8+
- Optionally install the PrisKV Python client SDK (if you build custom images)
9+
- Launch everything with Docker Compose and verify
10+
11+
### Prerequisites
12+
- A Linux host with Docker and Docker Compose installed
13+
- NVIDIA GPU drivers and runtime set up (required for vLLM/SGLang GPU inference)
14+
- Privileged mode enabled for Docker (required for PrisKV server and engine containers)
15+
16+
## PrisKV Server Image
17+
### Use prebuilt image
18+
- kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/priskv:v0.0.2
19+
20+
### Build from source
21+
```bash
22+
# Clone the repo
23+
git clone https://github.com/aibrix/PrisKV
24+
cd PrisKV
25+
26+
# Build a server image from source (use Ubuntu 22.04 Dockerfile available in repo)
27+
TAG="aibrix/priskv:v0.0.2"
28+
docker build . -t ${TAG} --network=host -f ./docker/Dockerfile_ubuntu2204
29+
```
30+
31+
## Inference Engine Image
32+
You need an inference engine image with aibrix_kvcache and PrisKV (either vLLM or SGLang). You can use one of the prebuilt images below or build your own following the linked instructions.
33+
34+
Prebuilt images:
35+
- vLLM + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.10.2-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121
36+
- SGLang + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/sglang:v0.5.5.post3-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121
37+
38+
If you need to build the inference engine image yourself, please refer to https://github.com/vllm-project/aibrix/tree/main/python/aibrix_kvcache/integration/ for building base images with aibrix_kvcache and follow the following steps to install PrisKV Python client SDK.
39+
40+
### PrisKV Python Client Installation (optional)
41+
If you’re building a custom engine image, you may need the PrisKV Python SDK.
42+
43+
#### Install dependencies
44+
```bash
45+
apt update && apt install -y \
46+
git gcc make cmake librdmacm-dev rdma-core libibverbs-dev \
47+
libncurses5-dev libmount-dev libevent-dev libssl-dev \
48+
libhiredis-dev liburing-dev
49+
```
50+
51+
#### Option A: Install with pip (prebuilt wheel or from PyPI)
52+
```bash
53+
# From PyPI (simplest)
54+
pip install pypriskv
55+
```
56+
57+
#### Option B: Build the Python client from source
58+
```bash
59+
git clone https://github.com/aibrix/PrisKV
60+
cd PrisKV
61+
make pyclient
62+
# Install the built wheel (adjust the version tag to the one you built)
63+
pip install pypriskv/dist/priskv-0.0.2-cp312-cp312-manylinux2014_x86_64.whl
64+
```
65+
66+
## Benchmark and Deployment with Docker Compose
67+
- Use any deploy.yaml under kvcache-offloading/xxx/ (e.g., vLLM or SGLang directories). These compose files typically start: PrisKV server, the engine, and a benchmark container.
68+
69+
### Customize settings
70+
#### PrisKV Configuration
71+
Please refer to https://aibrix.readthedocs.io/latest/designs/aibrix-kvcache-offloading-framework.html#priskv-connector-configuration for connector configuration.
72+
73+
Modify `PRISKV_CLUSTER_META` to describe the consistent hashing topology for your PrisKV cluster. For a single-server setup:
74+
```json
75+
{
76+
"version": 1,
77+
"nodes": [
78+
{
79+
"name": "node0",
80+
"addr": "<REPLACE_WITH_SERVER_IP>",
81+
"port": 9000,
82+
"slots": [
83+
{ "start": 0, "end": 4095 }
84+
]
85+
}
86+
]
87+
}
88+
```
89+
Tips:
90+
- Replace <REPLACE_WITH_SERVER_IP> with the reachable IP of your PrisKV server and update `-a` in the PrisKV server command.
91+
- If port 9000 is not available, change it to a free port in `PRISKV_CLUSTER_META` and update `-p` in the PrisKV server command.
92+
93+
#### PrisKV Server Arguments
94+
Please refer to https://github.com/aibrix/PrisKV/blob/main/README.md#server-command-line-arguments for all available server command line arguments.
95+
96+
#### Engine Configuration
97+
- `ENGINE_PORT`: port for engine to listen on (default: `18000`).
98+
- `MODEL`: folder name under `/data01/models` on host.
99+
- `TP`: tensor parallelism size; ensure it matches `CUDA_VISIBLE_DEVICES` count.
100+
- `VLLM_KV_CONFIG`: can use blank string to disable KV connector.
101+
- `SGLANG_HICACHE_STORAGE_BACKEND`: storage backend for SGLang HiCache.
102+
103+
### Deploy
104+
- Run `docker compose -f deploy.yaml up -d` to start all services
105+
- Run `docker compose -f deploy.yaml ps` to list running containers
106+
- Run `docker compose -f deploy.yaml logs -f` to stream logs from all services
107+
- Or, run `docker compose -f deploy.yaml logs -f engine` and `docker compose -f deploy.yaml logs -f bench` to view engine and benchmark logs separately
108+
- Run `docker compose -f deploy.yaml stop` to stop all services
109+
110+
### Step-by-step: vLLM example
111+
1) Navigate to the vLLM sample directory: `cd samples/kvcache-offloading/vllm`.
112+
2) Ensure your model files exist on the host at `/data01/models/<MODEL>`. If not, download or place your model there. Alternatively, update the `volumes:` and `--model` path in the compose file to match your host path.
113+
3) Open `deploy.yaml` and adjust:
114+
- `PRISKV_CLUSTER_META`: use the single-node example above and replace `<REPLACE_WITH_SERVER_IP>` with your PrisKV server IP.
115+
- `ENGINE_PORT`: set to the port you want the engine to listen on (default: `18000`).
116+
- `MODEL`: set to the folder name under `/data01/models` (e.g., `Qwen3-32B`).
117+
- `TP`: set to the number of GPUs you will use (e.g., `4`).
118+
- `CUDA_VISIBLE_DEVICES`: list the GPU IDs (e.g., `0,1,2,3`).
119+
- `priskv` service command: update `-a` to your host IP or `0.0.0.0` to bind all interfaces; keep `-p 9000` unless you need a different port.
120+
- Optionally change `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD` (remember to keep engine/redis consistent).
121+
4) Start services: `docker compose -f deploy.yaml up -d`.
122+
5) Verify services:
123+
- Verify Redis initialization: `docker compose -f deploy.yaml logs -f init`.
124+
- Check PrisKV logs: `docker compose -f deploy.yaml logs -f priskv` (ensure it reports listening on your chosen address/port).
125+
- Check engine logs: `docker compose -f deploy.yaml logs -f engine`
126+
6) Benchmark: the `bench` container will automatically run two rounds. View output: `docker compose -f deploy.yaml logs -f bench`.
127+
7) Stop services: `docker compose -f deploy.yaml stop`.
128+
8) Cleanup (optional): `docker compose -f deploy.yaml down -v` to remove containers and Redis data.
129+
130+
### Common issues
131+
- Engine startup errors about model path: ensure `/data01/models/<MODEL>` exists and is mounted; otherwise adjust `volumes:` and `--model` path.
132+
- GPU visibility: ensure NVIDIA drivers and runtime are installed; test with `nvidia-smi` inside the engine container.
133+
- PrisKV bind error: update `-a` to your host IP or `0.0.0.0`; ensure port `9000` is not in use.
134+
- Redis authentication errors: confirm the password matches `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD` everywhere.
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
## 使用 PrisKV 进行 KVCache 卸载 —— Quick Start
2+
3+
本指南将使用 Docker Compose 启动一个完整的 KVCache 卸载环境:包含 PrisKV Server、推理引擎(vLLM 或 SGLang)以及基准测试容器。
4+
5+
共包含以下步骤:
6+
- 准备 PrisKV Server 镜像
7+
- 准备带 aibrix_kvcache 的推理引擎镜像(vLLM 或 SGLang)
8+
- 可选:安装 PrisKV Python client SDK(用于自定义镜像或开发客户端代码)
9+
- 使用 Docker Compose 启动并验证各服务
10+
11+
### 环境准备
12+
- 安装了 Docker 和 Docker Compose 的 Linux 环境
13+
- 已安装 NVIDIA GPU 驱动与运行时(vLLM/SGLang 的 GPU 推理需要)
14+
- Docker 以特权模式运行(PrisKV Server和引擎容器需要)
15+
16+
## PrisKV Server镜像
17+
### 使用预构建镜像
18+
- kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/priskv:v0.0.2
19+
20+
### 从源码构建
21+
```bash
22+
# 克隆仓库
23+
git clone https://github.com/aibrix/PrisKV
24+
cd PrisKV
25+
26+
# 使用仓库内的 Ubuntu 22.04 Dockerfile 构建Server镜像
27+
TAG="aibrix/priskv:v0.0.2"
28+
docker build . -t ${TAG} --network=host -f ./docker/Dockerfile_ubuntu2204
29+
```
30+
31+
## 推理引擎镜像
32+
使用的推理引擎镜像(vLLM 或 SGLang)需要集成 aibrix_kvcache 和 PrisKV。可以直接使用以下预先构建好的镜像,或者按照链接给出的说明自行构建。
33+
34+
预构建镜像:
35+
- vLLM + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.10.2-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121
36+
- SGLang + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/sglang:v0.5.5.post3-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121
37+
38+
如果无法使用预构建镜像,请参考:https://github.com/vllm-project/aibrix/tree/main/python/aibrix_kvcache/integration/ 自行构建集成了 aibrix_kvcache 的基础镜像并按照下列步骤安装 PrisKV Python client SDK。
39+
40+
### PrisKV Python 客户端安装(可选)
41+
如果你要构建自定义引擎镜像,那么需要在基础镜像上安装 PrisKV Python SDK。
42+
43+
#### 安装依赖
44+
```bash
45+
apt update && apt install -y \
46+
git gcc make cmake librdmacm-dev rdma-core libibverbs-dev \
47+
libncurses5-dev libmount-dev libevent-dev libssl-dev \
48+
libhiredis-dev liburing-dev
49+
```
50+
51+
#### 方案 A:通过 pip 安装
52+
```bash
53+
# 从 PyPI 安装(最简单)
54+
pip install pypriskv
55+
```
56+
57+
#### 方案 B:从源码编译并安装
58+
```bash
59+
git clone https://github.com/aibrix/PrisKV
60+
cd PrisKV
61+
make pyclient
62+
# 安装编译好的 wheel(根据实际编译的版本调整文件名)
63+
pip install pypriskv/dist/priskv-0.0.2-cp312-cp312-manylinux2014_x86_64.whl
64+
```
65+
66+
## 基准测试与 Docker Compose 部署
67+
- 使用 kvcache-offloading/xxx/ 路径下的任意 deploy.yaml(例如 vLLM 或 SGLang 目录下的 deploy.yaml),通过这些文件启动 docker compose 会拉起以下容器:PrisKV Server、引擎以及基准测试容器。
68+
69+
### 自定义设置
70+
#### PrisKV 配置
71+
Connector 配置请参考:https://aibrix.readthedocs.io/latest/designs/aibrix-kvcache-offloading-framework.html#priskv-connector-configuration
72+
73+
修改 `PRISKV_CLUSTER_META` 来描述 PrisKV 集群的一致性哈希拓扑。单节点示例:
74+
```json
75+
{
76+
"version": 1,
77+
"nodes": [
78+
{
79+
"name": "node0",
80+
"addr": "<REPLACE_WITH_SERVER_IP>",
81+
"port": 9000,
82+
"slots": [
83+
{ "start": 0, "end": 4095 }
84+
]
85+
}
86+
]
87+
}
88+
```
89+
Tips:
90+
- 将 <REPLACE_WITH_SERVER_IP> 替换为 PrisKV Server 的 IP,并在 PrisKV 启动命令中同步更新 `-a` 参数的值。
91+
- 若 9000 端口不可用,可在 `PRISKV_CLUSTER_META` 中改为其它空闲端口,并在 PrisKV 启动命令中同步更新 `-p` 参数的值。
92+
93+
#### PrisKV Server 命令行参数
94+
请参考:https://github.com/aibrix/PrisKV/blob/main/README.md#server-command-line-arguments 获取全部可用参数说明。
95+
96+
#### 引擎配置
97+
- `ENGINE_PORT`:引擎监听端口(默认:`18000`)。
98+
- `MODEL`: Host 上的模型目录名(位于 `/data01/models` 下)。
99+
- `TP`: 张量并行度;确保与 `CUDA_VISIBLE_DEVICES` 的数量一致。
100+
- `VLLM_KV_CONFIG`:可设置为空字符串以禁用 KV Connector。
101+
- `SGLANG_HICACHE_STORAGE_BACKEND`:SGLang HiCache 存储后端。
102+
103+
### 部署
104+
- 运行 `docker compose -f deploy.yaml up -d` 启动所有服务
105+
- 运行 `docker compose -f deploy.yaml ps` 查看运行中的容器
106+
- 运行 `docker compose -f deploy.yaml logs -f` 持续查看所有服务日志
107+
- 或分别查看:`docker compose -f deploy.yaml logs -f engine``docker compose -f deploy.yaml logs -f bench`
108+
- 运行 `docker compose -f deploy.yaml stop` 停止所有服务
109+
110+
### vLLM 示例:
111+
1) 进入 vLLM 示例目录:`cd samples/kvcache-offloading/vllm`
112+
2) 确认 Host 上存在 `/data01/models/<MODEL>` 模型目录。若没有,请下载或把已有模型放置到该路径;或在 compose 文件中调整 `volumes:``--model` 的路径以匹配已存在的 Host 路径。
113+
3) 打开 `deploy.yaml` 并调整:
114+
- `PRISKV_CLUSTER_META`:使用上述单节点示例,并将 `<REPLACE_WITH_SERVER_IP>` 替换为你的 PrisKV Server IP。
115+
- `ENGINE_PORT`:设置引擎监听端口(默认:`18000`)。
116+
- `MODEL`:设置为 `/data01/models` 下的目录名(例如 `Qwen3-32B`)。
117+
- `TP`:设置你将使用的 GPU 数量(例如 `4`)。
118+
- `CUDA_VISIBLE_DEVICES`:列出 GPU ID(例如 `0,1,2,3`)。
119+
- `priskv` service 命令:将 `-a` 更新为你的 Host IP 或 `0.0.0.0`(绑定所有网卡);`-p` 保持为 9000,除非你需要其它端口。
120+
- 可选:修改 `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD`(注意引擎和 redis 环境需一致)。
121+
4) 启动服务:`docker compose -f deploy.yaml up -d`
122+
5) 验证服务:
123+
- 验证 Redis 初始化:`docker compose -f deploy.yaml logs -f init`
124+
- 查看 PrisKV 日志:`docker compose -f deploy.yaml logs -f priskv`(确认已在选择的地址/端口上监听)。
125+
- 查看引擎日志:`docker compose -f deploy.yaml logs -f engine`
126+
6) 基准测试:`bench` 容器将自动运行两轮基准测试。查看输出:`docker compose -f deploy.yaml logs -f bench`
127+
7) 停止服务:`docker compose -f deploy.yaml stop`
128+
8) 清理(可选):`docker compose -f deploy.yaml down -v` 删除容器并清除 Redis 数据。
129+
130+
### 常见问题
131+
- 引擎启动报模型路径错误:确保 `/data01/models/<MODEL>` 存在且已挂载;否则调整 `volumes:``--model` 路径。
132+
- GPU 可见性:确保已安装 NVIDIA 驱动与运行时;可在引擎容器内执行 `nvidia-smi` 测试。
133+
- PrisKV 绑定失败:将 `-a` 改为你的 Host IP 或 `0.0.0.0`;确保 `9000` 端口未被占用。
134+
- Redis 认证错误:确认各处使用的密码与 `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD` 保持一致。

0 commit comments

Comments
 (0)