Skip to content

Commit 1fc3fd7

Browse files
feat: added vLLM
1 parent 3254de3 commit 1fc3fd7

File tree

7 files changed

+87
-39
lines changed

7 files changed

+87
-39
lines changed

Dockerfile

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,14 @@ RUN --mount=type=cache,target=/root/.cache/pip \
1010

1111
RUN rm -rf /root/.cache/pip
1212

13+
# Install uv for faster package management
14+
RUN pip install uv
15+
16+
# Create separate venv for vLLM using uv to avoid flash-attn conflicts with Axolotl
17+
# Match CUDA version with base image (CUDA 12.6)
18+
RUN uv venv /opt/vllm-venv && \
19+
uv pip install --python /opt/vllm-venv/bin/python vllm --torch-backend=cu126
20+
1321
# Expose vLLM port (not started automatically)
1422
EXPOSE 8000
1523

README.md

Lines changed: 6 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,10 @@ axolotl train config.yaml
4141
4. **Optional - Start vLLM server** (after training):
4242

4343
```bash
44-
# Option A: Using YAML config (recommended)
45-
cp vllm_config_example.yaml my_config.yaml
46-
# Edit my_config.yaml with your model path
47-
./start_vllm.sh my_config.yaml
48-
49-
# Option B: Command line
50-
./start_vllm.sh ./outputs/lora-out --lora-modules lora_name=./outputs/lora-out
44+
# Create your vLLM config based on the example
45+
cp vllm_config_example.yaml my_vllm_config.yaml
46+
# Edit my_vllm_config.yaml with your trained model path and settings
47+
./start_vllm.sh my_vllm_config.yaml
5148
```
5249

5350
## 🏗️ Local Development
@@ -150,30 +147,14 @@ After training, you can serve your model using the built-in vLLM server:
150147

151148
### Quick Start vLLM
152149

153-
#### Option A: Using YAML Config (Recommended)
154-
155150
```bash
156151
# 1. Copy and customize the example config
157152
cp vllm_config_example.yaml my_vllm_config.yaml
158-
# Edit my_vllm_config.yaml with your model path and settings
159-
160-
# 2. Start vLLM with config
153+
# 2. Edit my_vllm_config.yaml with your trained model path and settings
154+
# 3. Start vLLM with your config
161155
./start_vllm.sh my_vllm_config.yaml
162156
```
163157

164-
#### Option B: Command Line Arguments
165-
166-
```bash
167-
# For LoRA models
168-
./start_vllm.sh ./outputs/lora-out --lora-modules lora_name=./outputs/lora-out
169-
170-
# For merged/full fine-tuned models
171-
./start_vllm.sh ./outputs/merged-model
172-
173-
# With custom settings
174-
./start_vllm.sh ./outputs/my-model --max-model-len 4096 --gpu-memory-utilization 0.8
175-
```
176-
177158
### vLLM Features
178159

179160
- **OpenAI-compatible API** at `http://localhost:8000`

docker-compose.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
services:
2+
llm-finetuning:
3+
image: runpod/llm-finetuning-axolotl:dev
4+
platform: linux/amd64
5+
6+
# GPU access
7+
# deploy:
8+
# resources:
9+
# reservations:
10+
# devices:
11+
# - driver: nvidia
12+
# count: all
13+
# capabilities: [gpu]
14+
15+
# Port mapping for vLLM
16+
ports:
17+
- "8000:8000" # vLLM API server
18+
- "8888:8888" # Jupyter Lab (from base image)
19+
- "2222:22" # SSH access (from base image)
20+
21+
# Environment variables for training configuration
22+
environment:
23+
# Required credentials
24+
- HF_TOKEN=${HF_TOKEN}
25+
# - WANDB_API_KEY=${WANDB_API_KEY}
26+
27+
# Training configuration (examples - customize as needed)
28+
- AXOLOTL_BASE_MODEL=TinyLlama/TinyLlama_v1.1
29+
- AXOLOTL_DATASETS=[{"path":"mhenrichsen/alpaca_2k_test","type":"alpaca"}]
30+
- AXOLOTL_OUTPUT_DIR=./outputs/my_training
31+
- AXOLOTL_ADAPTER=lora
32+
- AXOLOTL_LORA_R=8
33+
- AXOLOTL_LORA_ALPHA=16
34+
- AXOLOTL_NUM_EPOCHS=1
35+
- AXOLOTL_MICRO_BATCH_SIZE=2
36+
- AXOLOTL_GRADIENT_ACCUMULATION_STEPS=1
37+
- AXOLOTL_LEARNING_RATE=0.0002
38+
- AXOLOTL_LOAD_IN_8BIT=true
39+
40+
# Optional: Disable Jupyter if not needed
41+
# - JUPYTER_DISABLE=1
42+
43+
# Optional: SSH key for access
44+
# - PUBLIC_KEY=${PUBLIC_KEY}
45+
46+
# Volume mounts for persistent data
47+
volumes:
48+
- ./outputs:/workspace/data/axolotl-artifacts
49+
- ./configs:/workspace/fine-tuning/configs
50+
51+
# Keep container running
52+
tty: true
53+
stdin_open: true
54+
55+
# Optional: Override command for debugging
56+
# command: ["sleep", "infinity"]

env.example

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Copy this file to .env and fill in your values
2+
# cp env.example .env
3+
4+
# Required credentials
5+
HF_TOKEN=your-huggingface-token-here
6+
WANDB_API_KEY=your-wandb-api-key-here
7+
8+
# Optional: SSH public key for container access
9+
# PUBLIC_KEY=ssh-rsa AAAAB3NzaC1yc2E... [email protected]

requirements.txt

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1 @@
1-
runpod~=1.7.0
2-
3-
# vLLM for inference serving
4-
# The base image already has flash-attn from axolotl installation
5-
# vLLM will use the existing flash-attn if it's compatible
6-
vllm
1+
runpod~=1.7.0

scripts/WELCOME

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,9 @@ You've successfully configured your training environment! 🎉
88
1️⃣ Familiarize yourself with the examples/ and outputs/ directories.
99
2️⃣ Carefully review your config.yaml settings, verifying both format and values. As a best practice, ensure that all hyperparameters are tuned according to your specific use case to prevent potential errors.
1010
3️⃣ Start fine-tuning when you're ready with `axolotl train config.yaml`
11-
4️⃣ After training, serve your model with `./start_vllm.sh ./outputs/your-model`
11+
4️⃣ After training, serve your model with `./start_vllm.sh your_vllm_config.yaml`
1212

1313
────────────────────────────────────
14-
✨ POWERED BY AXOLOTL 🦎 + vLLM 🚀
14+
✨ POWERED BY AXOLOTL 🦎
1515
────────────────────────────────────
16-
📄 Axolotl Docs: https://axolotl-ai-cloud.github.io/axolotl/docs/config.html
17-
🌐 vLLM Server: http://localhost:8000 (after starting)
16+
📄 Axolotl Docs: https://axolotl-ai-cloud.github.io/axolotl/docs/config.html

scripts/start_vllm.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ except:
8080
echo "🔧 Additional args: $*"
8181
echo ""
8282

83-
# Start vLLM with config file
84-
python -m vllm.entrypoints.openai.api_server \
83+
# Start vLLM with config file (using dedicated venv)
84+
/opt/vllm-venv/bin/python -m vllm.entrypoints.openai.api_server \
8585
--config "$INPUT" \
8686
"$@"
8787

@@ -95,8 +95,8 @@ else
9595
echo "🌐 Server will be available at: http://0.0.0.0:8000"
9696
echo ""
9797

98-
# Start vLLM with the provided model and any additional arguments
99-
python -m vllm.entrypoints.openai.api_server \
98+
# Start vLLM with the provided model and any additional arguments (using dedicated venv)
99+
/opt/vllm-venv/bin/python -m vllm.entrypoints.openai.api_server \
100100
--model "$MODEL_PATH" \
101101
--host 0.0.0.0 \
102102
--port 8000 \

0 commit comments

Comments
 (0)