|
| 1 | +## KVCache Offloading with PrisKV — Quickstart |
| 2 | + |
| 3 | +This guide helps you launch a complete KVCache offloading setup with PrisKV, an inference engine (vLLM or SGLang), and a benchmark container using Docker Compose. |
| 4 | + |
| 5 | +What you’ll do: |
| 6 | +- Prepare the PrisKV server image (prebuilt or build from source) |
| 7 | +- Prepare an inference engine image (vLLM or SGLang) that already has aibrix_kvcache |
| 8 | +- Optionally install the PrisKV Python client SDK (if you build custom images) |
| 9 | +- Launch everything with Docker Compose and verify |
| 10 | + |
| 11 | +### Prerequisites |
| 12 | +- A Linux host with Docker and Docker Compose installed |
| 13 | +- NVIDIA GPU drivers and runtime set up (required for vLLM/SGLang GPU inference) |
| 14 | +- Privileged mode enabled for Docker (required for PrisKV server and engine containers) |
| 15 | + |
| 16 | +## PrisKV Server Image |
| 17 | +### Use prebuilt image |
| 18 | +- kvcache-container-image-hb2-cn-beijing.cr.volces.com/aibrix/priskv:v0.0.2 |
| 19 | + |
| 20 | +### Build from source |
| 21 | +```bash |
| 22 | +# Clone the repo |
| 23 | +git clone https://github.com/aibrix/PrisKV |
| 24 | +cd PrisKV |
| 25 | + |
| 26 | +# Build a server image from source (use Ubuntu 22.04 Dockerfile available in repo) |
| 27 | +TAG="aibrix/priskv:v0.0.2" |
| 28 | +docker build . -t ${TAG} --network=host -f ./docker/Dockerfile_ubuntu2204 |
| 29 | +``` |
| 30 | + |
| 31 | +## Inference Engine Image |
| 32 | +You need an inference engine image with aibrix_kvcache and PrisKV (either vLLM or SGLang). You can use one of the prebuilt images below or build your own following the linked instructions. |
| 33 | + |
| 34 | +Prebuilt images: |
| 35 | +- vLLM + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/vllm-openai:v0.10.2-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121 |
| 36 | +- SGLang + aibrix_kvcache + nixl + PrisKV: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/sglang:v0.5.5.post3-aibrix0.5.1-nixl0.7.1-priskv0.0.2-20251121 |
| 37 | + |
| 38 | +If you need to build the inference engine image yourself, please refer to https://github.com/vllm-project/aibrix/tree/main/python/aibrix_kvcache/integration/ for building base images with aibrix_kvcache and follow the following steps to install PrisKV Python client SDK. |
| 39 | + |
| 40 | +### PrisKV Python Client Installation (optional) |
| 41 | +If you’re building a custom engine image, you may need the PrisKV Python SDK. |
| 42 | + |
| 43 | +#### Install dependencies |
| 44 | +```bash |
| 45 | +apt update && apt install -y \ |
| 46 | + git gcc make cmake librdmacm-dev rdma-core libibverbs-dev \ |
| 47 | + libncurses5-dev libmount-dev libevent-dev libssl-dev \ |
| 48 | + libhiredis-dev liburing-dev |
| 49 | +``` |
| 50 | + |
| 51 | +#### Option A: Install with pip (prebuilt wheel or from PyPI) |
| 52 | +```bash |
| 53 | +# From PyPI (simplest) |
| 54 | +pip install pypriskv |
| 55 | +``` |
| 56 | + |
| 57 | +#### Option B: Build the Python client from source |
| 58 | +```bash |
| 59 | +git clone https://github.com/aibrix/PrisKV |
| 60 | +cd PrisKV |
| 61 | +make pyclient |
| 62 | +# Install the built wheel (adjust the version tag to the one you built) |
| 63 | +pip install pypriskv/dist/priskv-0.0.2-cp312-cp312-manylinux2014_x86_64.whl |
| 64 | +``` |
| 65 | + |
| 66 | +## Benchmark and Deployment with Docker Compose |
| 67 | +- Use any deploy.yaml under kvcache-offloading/xxx/ (e.g., vLLM or SGLang directories). These compose files typically start: PrisKV server, the engine, and a benchmark container. |
| 68 | + |
| 69 | +### Customize settings |
| 70 | +#### PrisKV Configuration |
| 71 | +Please refer to https://aibrix.readthedocs.io/latest/designs/aibrix-kvcache-offloading-framework.html#priskv-connector-configuration for connector configuration. |
| 72 | + |
| 73 | +Modify `PRISKV_CLUSTER_META` to describe the consistent hashing topology for your PrisKV cluster. For a single-server setup: |
| 74 | +```json |
| 75 | +{ |
| 76 | + "version": 1, |
| 77 | + "nodes": [ |
| 78 | + { |
| 79 | + "name": "node0", |
| 80 | + "addr": "<REPLACE_WITH_SERVER_IP>", |
| 81 | + "port": 9000, |
| 82 | + "slots": [ |
| 83 | + { "start": 0, "end": 4095 } |
| 84 | + ] |
| 85 | + } |
| 86 | + ] |
| 87 | +} |
| 88 | +``` |
| 89 | +Tips: |
| 90 | +- Replace <REPLACE_WITH_SERVER_IP> with the reachable IP of your PrisKV server and update `-a` in the PrisKV server command. |
| 91 | +- If port 9000 is not available, change it to a free port in `PRISKV_CLUSTER_META` and update `-p` in the PrisKV server command. |
| 92 | + |
| 93 | +#### PrisKV Server Arguments |
| 94 | +Please refer to https://github.com/aibrix/PrisKV/blob/main/README.md#server-command-line-arguments for all available server command line arguments. |
| 95 | + |
| 96 | +#### Engine Configuration |
| 97 | + - `ENGINE_PORT`: port for engine to listen on (default: `18000`). |
| 98 | + - `MODEL`: folder name under `/data01/models` on host. |
| 99 | + - `TP`: tensor parallelism size; ensure it matches `CUDA_VISIBLE_DEVICES` count. |
| 100 | + - `VLLM_KV_CONFIG`: can use blank string to disable KV connector. |
| 101 | + - `SGLANG_HICACHE_STORAGE_BACKEND`: storage backend for SGLang HiCache. |
| 102 | + |
| 103 | +### Deploy |
| 104 | +- Run `docker compose -f deploy.yaml up -d` to start all services |
| 105 | +- Run `docker compose -f deploy.yaml ps` to list running containers |
| 106 | +- Run `docker compose -f deploy.yaml logs -f` to stream logs from all services |
| 107 | +- Or, run `docker compose -f deploy.yaml logs -f engine` and `docker compose -f deploy.yaml logs -f bench` to view engine and benchmark logs separately |
| 108 | +- Run `docker compose -f deploy.yaml stop` to stop all services |
| 109 | + |
| 110 | +### Step-by-step: vLLM example |
| 111 | +1) Navigate to the vLLM sample directory: `cd samples/kvcache-offloading/vllm`. |
| 112 | +2) Ensure your model files exist on the host at `/data01/models/<MODEL>`. If not, download or place your model there. Alternatively, update the `volumes:` and `--model` path in the compose file to match your host path. |
| 113 | +3) Open `deploy.yaml` and adjust: |
| 114 | + - `PRISKV_CLUSTER_META`: use the single-node example above and replace `<REPLACE_WITH_SERVER_IP>` with your PrisKV server IP. |
| 115 | + - `ENGINE_PORT`: set to the port you want the engine to listen on (default: `18000`). |
| 116 | + - `MODEL`: set to the folder name under `/data01/models` (e.g., `Qwen3-32B`). |
| 117 | + - `TP`: set to the number of GPUs you will use (e.g., `4`). |
| 118 | + - `CUDA_VISIBLE_DEVICES`: list the GPU IDs (e.g., `0,1,2,3`). |
| 119 | + - `priskv` service command: update `-a` to your host IP or `0.0.0.0` to bind all interfaces; keep `-p 9000` unless you need a different port. |
| 120 | + - Optionally change `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD` (remember to keep engine/redis consistent). |
| 121 | +4) Start services: `docker compose -f deploy.yaml up -d`. |
| 122 | +5) Verify services: |
| 123 | + - Verify Redis initialization: `docker compose -f deploy.yaml logs -f init`. |
| 124 | + - Check PrisKV logs: `docker compose -f deploy.yaml logs -f priskv` (ensure it reports listening on your chosen address/port). |
| 125 | + - Check engine logs: `docker compose -f deploy.yaml logs -f engine` |
| 126 | +6) Benchmark: the `bench` container will automatically run two rounds. View output: `docker compose -f deploy.yaml logs -f bench`. |
| 127 | +7) Stop services: `docker compose -f deploy.yaml stop`. |
| 128 | +8) Cleanup (optional): `docker compose -f deploy.yaml down -v` to remove containers and Redis data. |
| 129 | + |
| 130 | +### Common issues |
| 131 | +- Engine startup errors about model path: ensure `/data01/models/<MODEL>` exists and is mounted; otherwise adjust `volumes:` and `--model` path. |
| 132 | +- GPU visibility: ensure NVIDIA drivers and runtime are installed; test with `nvidia-smi` inside the engine container. |
| 133 | +- PrisKV bind error: update `-a` to your host IP or `0.0.0.0`; ensure port `9000` is not in use. |
| 134 | +- Redis authentication errors: confirm the password matches `AIBRIX_KV_CACHE_OL_PRISKV_PASSWORD` everywhere. |
0 commit comments