Skip to content

Commit 319e2eb

Browse files
authored
feat(04_dependencies): add mixed_worker and GPU vs CPU packaging docs (#42)
* feat(04_dependencies): add mixed_worker example and GPU vs CPU packaging docs Add mixed_worker.py demonstrating numpy used by both GPU and CPU endpoints -- the key scenario where the dependency blacklist fix matters. Update README with GPU vs CPU packaging section explaining base image differences, build exclusions, and the runtime safety net. * fix(review): address PR feedback for #42 - Clarify GPU worker docstring: numpy computations are CPU-bound despite GPU instance - Add input validation for size (clamped 1-10000) and values (list, max 100k elements) - Add note about GpuGroup vs GpuType enum inconsistency in README * fix(05_load_balancer): use typed param for gpu_lb compute endpoint Replace untyped `request: dict` with `numbers: list[float]` to fix division by zero on empty input and provide proper Swagger examples. Module-level Pydantic models cannot be used in LB endpoints because function bodies are serialized to remote workers. * fix(05_load_balancer): fix gpu_lb endpoint typing and torch import - Use typed param (numbers: list[float]) instead of untyped dict for proper Swagger examples and input validation - Remove broad except Exception that swallowed ImportError as misleading device info; torch should be available on GPU worker images - Add dependencies=["torch"] for environments where torch needs explicit installation
1 parent f3e6727 commit 319e2eb

File tree

3 files changed

+162
-23
lines changed

3 files changed

+162
-23
lines changed

01_getting_started/04_dependencies/README.md

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,25 @@ Learn how to manage Python packages and system dependencies in Flash workers.
66

77
- **Python dependencies** - Installing packages with version constraints
88
- **System dependencies** - Installing apt packages (ffmpeg, libgl1, etc.)
9+
- **GPU vs CPU packaging** - How dependencies are resolved differently per runtime
10+
- **Shared dependencies** - GPU and CPU endpoints using the same package (numpy)
911
- **Version constraints** - Supported syntax for version pinning
1012
- **Dependency optimization** - Minimizing cold start time
1113

1214
## Quick Start
1315

1416
**Prerequisites**: Complete the [repository setup](../../README.md#quick-start) first (clone, `make dev`, set API key).
1517

18+
### Files
19+
20+
| File | What it demonstrates |
21+
|------|---------------------|
22+
| `gpu_worker.py` | Python deps with version pins, system deps (ffmpeg, libgl1) |
23+
| `cpu_worker.py` | Data science deps on CPU (numpy, pandas, scipy), zero-dep worker |
24+
| `mixed_worker.py` | Same dependency (numpy) on both GPU and CPU endpoints |
25+
26+
> **Note:** `gpu_worker.py` uses `GpuGroup` while newer snippets in this README use `GpuType`. Both enums are supported by the SDK; `GpuType` is recommended for new code.
27+
1628
### Run This Example
1729

1830
```bash
@@ -38,6 +50,26 @@ uv run flash login
3850
uv run flash run
3951
```
4052

53+
## GPU vs CPU Packaging
54+
55+
GPU and CPU endpoints use different base Docker images, which affects how dependencies are resolved:
56+
57+
| | GPU images (`runpod/pytorch:*`) | CPU images (`python:X.Y-slim`) |
58+
|---|---|---|
59+
| **Base image** | PyTorch + CUDA + numpy + triton | Python stdlib only |
60+
| **Pre-installed** | torch, torchvision, torchaudio, numpy, triton | Nothing |
61+
| **Build artifact** | Excludes torch ecosystem (too large for 500 MB tarball) | Includes everything declared in `dependencies` |
62+
63+
**What this means for you:**
64+
65+
- **GPU endpoints**: `torch`, `torchvision`, `torchaudio`, and `triton` are excluded from the build artifact because they already exist in the base image and would exceed the 500 MB tarball limit. All other dependencies (including `numpy`) are packaged normally.
66+
- **CPU endpoints**: Every dependency must be in the build artifact. Nothing is pre-installed.
67+
- **Mixed projects**: When GPU and CPU endpoints share a dependency like `numpy`, it ships in the tarball. The GPU image ignores the duplicate (its pre-installed copy takes precedence).
68+
69+
See `mixed_worker.py` for a working example of GPU and CPU endpoints sharing `numpy`.
70+
71+
**Safety net**: If a dependency is missing from the build artifact at runtime, the worker attempts to install it on-the-fly and logs a warning. This prevents crashes but adds to cold start time. Always declare your dependencies explicitly to avoid this penalty.
72+
4173
## Dependency Types
4274

4375
### 1. Python Dependencies
@@ -47,7 +79,7 @@ Specified in the `Endpoint` decorator:
4779
```python
4880
@Endpoint(
4981
name="my-worker",
50-
gpu=GpuGroup.ADA_24,
82+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
5183
dependencies=[
5284
"requests==2.32.3", # Exact version
5385
"Pillow>=10.0.0", # Minimum version
@@ -70,7 +102,7 @@ Install apt packages:
70102
```python
71103
@Endpoint(
72104
name="my-worker",
73-
gpu=GpuGroup.AMPERE_16,
105+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
74106
dependencies=["opencv-python"],
75107
system_dependencies=["ffmpeg", "libgl1", "graphviz"],
76108
)
@@ -251,7 +283,7 @@ python cpu_worker.py
251283
```python
252284
@Endpoint(
253285
name="worker",
254-
gpu=GpuGroup.ADA_24,
286+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
255287
dependencies=[
256288
"requests==2.32.3", # API calls
257289
"Pillow>=10.0.0", # Image processing
@@ -335,17 +367,36 @@ numpy
335367

336368
**Note:** Worker dependencies in the `Endpoint` decorator are deployed automatically. `requirements.txt` is for local development only.
337369

370+
## Build Exclusions
371+
372+
Flash automatically excludes packages that are too large for the 500 MB build artifact limit. Currently excluded: `torch`, `torchvision`, `torchaudio`, `triton` (all CUDA-specific, pre-installed in GPU images).
373+
374+
You can exclude additional large packages with `--exclude`:
375+
376+
```bash
377+
# Exclude tensorflow from the build artifact
378+
flash build --exclude tensorflow
379+
```
380+
381+
**Important:** Only exclude packages that are pre-installed in your target runtime. If you exclude a package that a CPU endpoint needs, the worker will attempt to install it on-the-fly at startup. This works but adds to cold start time and logs a warning:
382+
383+
```
384+
WARNING - Package 'scipy' is not in the build artifact. Installing on-the-fly.
385+
This adds to cold start time -- consider adding it to your dependencies list
386+
to include it in the build artifact.
387+
```
388+
338389
## Advanced: External Docker Images
339390

340391
For complex dependencies, deploy a pre-built image:
341392

342393
```python
343-
from runpod_flash import Endpoint, GpuGroup
394+
from runpod_flash import Endpoint, GpuType
344395

345396
vllm = Endpoint(
346397
name="vllm-service",
347398
image="vllm/vllm-openai:latest",
348-
gpu=GpuGroup.ADA_24,
399+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
349400
)
350401

351402
# call it as an API client
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# GPU and CPU workers sharing a common dependency (numpy).
2+
# Demonstrates that dependencies work correctly across both runtime environments:
3+
# - GPU images (runpod/pytorch:*) have numpy pre-installed
4+
# - CPU images (python-slim) install numpy from the build artifact
5+
#
6+
# run with: flash run
7+
# test directly: python mixed_worker.py
8+
from runpod_flash import CpuInstanceType, Endpoint, GpuType
9+
10+
11+
@Endpoint(
12+
name="01_04_deps_gpu_numpy",
13+
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
14+
workers=(0, 3),
15+
dependencies=["numpy"],
16+
)
17+
async def gpu_matrix_multiply(input_data: dict) -> dict:
18+
"""GPU-instance worker running CPU-bound numpy matrix operations.
19+
20+
This endpoint runs on a GPU instance type, but uses standard numpy,
21+
so all computations execute on the CPU. On GPU images, numpy is
22+
pre-installed in the base image; the build artifact also includes
23+
it, so both paths work, with the image's copy taking precedence.
24+
"""
25+
import numpy as np
26+
27+
size = min(max(int(input_data.get("size", 100)), 1), 10_000)
28+
a = np.random.rand(size, size)
29+
b = np.random.rand(size, size)
30+
result = np.dot(a, b)
31+
32+
return {
33+
"status": "success",
34+
"worker_type": "GPU",
35+
"matrix_size": size,
36+
"result_shape": list(result.shape),
37+
"result_trace": float(np.trace(result)),
38+
"numpy_version": np.__version__,
39+
}
40+
41+
42+
@Endpoint(
43+
name="01_04_deps_cpu_numpy",
44+
cpu=CpuInstanceType.CPU3C_1_2,
45+
workers=(0, 3),
46+
dependencies=["numpy"],
47+
)
48+
async def cpu_statistics(input_data: dict) -> dict:
49+
"""CPU worker using numpy for statistical computations.
50+
51+
On CPU images (python-slim), numpy is NOT pre-installed. The build
52+
artifact must include it. Flash's build pipeline ships numpy in the
53+
tarball for CPU endpoints.
54+
"""
55+
import numpy as np
56+
57+
raw_values = input_data.get("values", [1.0, 2.0, 3.0, 4.0, 5.0])
58+
if not isinstance(raw_values, list) or len(raw_values) > 100_000:
59+
return {
60+
"status": "error",
61+
"message": "values must be a list with at most 100000 elements",
62+
}
63+
values = raw_values
64+
arr = np.array(values)
65+
66+
return {
67+
"status": "success",
68+
"worker_type": "CPU",
69+
"count": len(values),
70+
"mean": float(np.mean(arr)),
71+
"std": float(np.std(arr)),
72+
"median": float(np.median(arr)),
73+
"numpy_version": np.__version__,
74+
}
75+
76+
77+
if __name__ == "__main__":
78+
import asyncio
79+
80+
async def test():
81+
print("\n=== Testing GPU numpy (matrix multiply) ===")
82+
gpu_result = await gpu_matrix_multiply({"size": 50})
83+
print(f"Result: {gpu_result}\n")
84+
85+
print("=== Testing CPU numpy (statistics) ===")
86+
cpu_result = await cpu_statistics({"values": [10, 20, 30, 40, 50]})
87+
print(f"Result: {cpu_result}\n")
88+
89+
asyncio.run(test())

03_advanced_workers/05_load_balancer/gpu_lb.py

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
name="03_05_load_balancer_gpu",
88
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
99
workers=(1, 3),
10+
dependencies=["torch"],
1011
)
1112

1213

@@ -17,19 +18,23 @@ async def gpu_health() -> dict:
1718

1819

1920
@api.post("/compute")
20-
async def compute_intensive(request: dict) -> dict:
21+
async def compute_intensive(numbers: list[float]) -> dict:
2122
"""Perform compute-intensive operation on GPU.
2223
2324
Args:
24-
request: Request dict with numbers to process
25+
numbers: List of numbers to compute statistics on
2526
2627
Returns:
2728
Computation results
2829
"""
2930
import time
3031
from datetime import datetime, timezone
3132

32-
numbers = request.get("numbers", [])
33+
if not numbers:
34+
return {
35+
"status": "error",
36+
"message": "numbers list must not be empty",
37+
}
3338
start_time = time.time()
3439

3540
result = sum(x**2 for x in numbers)
@@ -54,21 +59,16 @@ async def compute_intensive(request: dict) -> dict:
5459
@api.get("/info")
5560
async def gpu_info() -> dict:
5661
"""Get GPU availability information."""
57-
try:
58-
import torch
62+
import torch
5963

60-
if torch.cuda.is_available():
61-
info = {
62-
"available": True,
63-
"device": torch.cuda.get_device_name(0),
64-
"count": torch.cuda.device_count(),
65-
}
66-
else:
67-
info = {"available": False, "device": "No GPU", "count": 0}
68-
except Exception as e:
69-
info = {"available": False, "device": str(e), "count": 0}
64+
if torch.cuda.is_available():
65+
return {
66+
"available": True,
67+
"device": torch.cuda.get_device_name(0),
68+
"count": torch.cuda.device_count(),
69+
}
7070

71-
return info
71+
return {"available": False, "device": "No GPU", "count": 0}
7272

7373

7474
if __name__ == "__main__":
@@ -82,8 +82,7 @@ async def test():
8282
print(f" {result}\n")
8383

8484
print("2. Compute intensive:")
85-
request_data = {"numbers": [1, 2, 3, 4, 5]}
86-
result = await compute_intensive(request_data)
85+
result = await compute_intensive([1, 2, 3, 4, 5])
8786
print(f" Sum of squares: {result['sum_of_squares']}")
8887
print(f" Mean: {result['mean']}\n")
8988

0 commit comments

Comments
 (0)