Skip to content

Commit ec910bd

Browse files
DifferentialityDevelopmentTimPietruskyTimPietruskyRunPod
authored
feat: enable tool calling support (#25)
* Update worker-config.json * Update engine.py * Update README.md * chore: added HF_TOKEN; use meta-llama/Llama-3.2-1B-Instruct for testing * feat: added TOOL_CALL_PARSER --------- Co-authored-by: Tim Pietrusky <[email protected]> Co-authored-by: NERDDISCO <[email protected]> Co-authored-by: Tim Pietrusky <[email protected]>
1 parent 7467328 commit ec910bd

File tree

8 files changed

+76
-8
lines changed

8 files changed

+76
-8
lines changed

.github/CONTRIBUTING.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,40 @@ Welcome! This guide explains how to develop and deploy the SGLang Worker for Run
2727
git clone <repo-url>
2828
cd worker-sglang
2929

30+
# Create .env file for Hugging Face token (required for gated models)
31+
echo "HF_TOKEN=your_huggingface_token_here" > .env
32+
3033
# Build locally for testing (optional - will be built in CI)
3134
docker build --platform linux/amd64 -t worker-sglang-local .
3235

33-
# Test with docker-compose
36+
# Test with docker-compose (will automatically use .env file)
3437
docker-compose up
3538
```
3639

37-
### 3. Making Changes
40+
### 3. Environment Configuration
41+
42+
The project uses a `.env` file for local development. Docker Compose automatically reads this file.
43+
44+
**Required for local testing:**
45+
46+
```bash
47+
# .env file (create in project root)
48+
HF_TOKEN=your_huggingface_token_here
49+
```
50+
51+
**Getting your HF_TOKEN:**
52+
53+
1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
54+
2. Create a new token with "Read" permissions
55+
3. Copy the token to your `.env` file
56+
57+
**⚠️ Security Note:**
58+
59+
- Never commit the `.env` file to git
60+
- The `.env` file is already in `.gitignore`
61+
- Use environment variables in production/CI
62+
63+
### 4. Making Changes
3864

3965
1. **Create feature branch:**
4066

@@ -44,7 +70,7 @@ docker-compose up
4470

4571
2. **Make your changes** to:
4672

47-
- Core files in `.runpod/` directory
73+
- Core files in project root
4874
- Configuration files
4975
- Documentation
5076

@@ -54,8 +80,11 @@ docker-compose up
5480
# Test Docker build
5581
docker build --platform linux/amd64 -t test-build .
5682

57-
# Test with sample input
83+
# Test with sample input (ensure .env file exists first)
5884
docker run --rm test-build python3 -c "import handler; print('Import successful')"
85+
86+
# Test with docker-compose (uses .env automatically)
87+
docker-compose up
5988
```
6089

6190
4. **Commit following conventions:**

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
!*.png
1+
!*.png
2+
.env
23
.DS_Store

.runpod/hub.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,24 @@
3030
"required": false
3131
}
3232
},
33+
{
34+
"key": "TOOL_CALL_PARSER",
35+
"input": {
36+
"name": "Tool Call Parser",
37+
"type": "string",
38+
"description": "Defines the parser used to interpret tool call responses",
39+
"default": "",
40+
"required": false,
41+
"advanced": true,
42+
"options": [
43+
{ "value": "llama3", "label": "llama3" },
44+
{ "value": "llama4", "label": "llama4" },
45+
{ "value": "mistral", "label": "mistral" },
46+
{ "value": "qwen25", "label": "qwen25" },
47+
{ "value": "deepseekv3", "label": "deepseekv3" }
48+
]
49+
}
50+
},
3351
{
3452
"key": "TOKENIZER_PATH",
3553
"input": {

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ All behaviour is controlled through environment variables:
5151
| `ENABLE_P2P_CHECK` | Enable P2P check for GPU access | false | boolean (true or false) |
5252
| `ENABLE_FLASHINFER_MLA` | Enable FlashInfer MLA optimization | false | boolean (true or false) |
5353
| `TRITON_ATTENTION_REDUCE_IN_FP32` | Cast Triton attention reduce op to FP32 | false | boolean (true or false) |
54+
| `TOOL_CALL_PARSER` | Defines the parser used to interpret responses | qwen25 | "llama3", "llama4", "mistral", "qwen25", "deepseekv3" |
5455

5556
## API Usage
5657

docker-compose.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,12 @@ services:
1414
environment:
1515
- HOST=0.0.0.0
1616
- PORT=30000
17-
- MODEL_PATH=HuggingFaceTB/SmolLM2-1.7B-Instruct
17+
- MODEL_PATH=meta-llama/Llama-3.2-1B-Instruct
1818
- TRUST_REMOTE_CODE=true
1919
- ATTENTION_BACKEND=flashinfer
2020
- SAMPLING_BACKEND=flashinfer
21+
- TOOL_CALL_PARSER=llama3
22+
- HF_TOKEN=${HF_TOKEN}
2123

2224
# make it work locally with <= 8 GB VRAM
2325
- MEM_FRACTION_STATIC=0.5

engine.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ def start_server(self):
6060
"LOAD_BALANCE_METHOD": "--load-balance-method",
6161
"ATTENTION_BACKEND": "--attention-backend",
6262
"SAMPLING_BACKEND": "--sampling-backend",
63+
"TOOL_CALL_PARSER": "--tool-call-parser"
6364
}
6465

6566
# Boolean flags

test_input.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"input": {
33
"openai_route": "/v1/chat/completions",
44
"openai_input": {
5-
"model": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
5+
"model": "meta-llama/Llama-3.2-1B-Instruct",
66
"messages": [
77
{ "role": "user", "content": "What is the capital of France?" }
88
]

worker-config.json

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
"LOAD_FORMAT",
1414
"DTYPE",
1515
"CHAT_TEMPLATE",
16-
"SERVED_MODEL_NAME"
16+
"SERVED_MODEL_NAME",
17+
"TOOL_CALL_PARSER"
1718
]
1819
},
1920
{
@@ -213,6 +214,21 @@
213214
{"value": "float32", "label": "float32"}
214215
]
215216
},
217+
"TOOL_CALL_PARSER": {
218+
"env_var_name": "TOOL_CALL_PARSER",
219+
"value": "qwen25",
220+
"title": "Tool Call Parser",
221+
"description": "Defines the parser used to interpret responses",
222+
"required": false,
223+
"type": "select",
224+
"options": [
225+
{"value": "llama3", "label": "llama3"},
226+
{"value": "llama4", "label": "llama4"},
227+
{"value": "mistral", "label": "mistral"},
228+
{"value": "qwen25", "label": "qwen25"},
229+
{"value": "deepseekv3", "label": "deepseekv3"}
230+
]
231+
},
216232
"CONTEXT_LENGTH": {
217233
"env_var_name": "CONTEXT_LENGTH",
218234
"value": "",

0 commit comments

Comments
 (0)