feat(04_dependencies): add mixed_worker and GPU vs CPU packaging docs (#42)

deanq · web-flow · commit 319e2ebcebc2 · 2026-03-10T15:40:30.000-07:00
* feat(04_dependencies): add mixed_worker example and GPU vs CPU packaging docs Add mixed_worker.py demonstrating numpy used by both GPU and CPU endpoints -- the key scenario where the dependency blacklist fix matters. Update README with GPU vs CPU packaging section explaining base image differences, build exclusions, and the runtime safety net. * fix(review): address PR feedback for #42 - Clarify GPU worker docstring: numpy computations are CPU-bound despite GPU instance - Add input validation for size (clamped 1-10000) and values (list, max 100k elements) - Add note about GpuGroup vs GpuType enum inconsistency in README * fix(05_load_balancer): use typed param for gpu_lb compute endpoint Replace untyped `request: dict` with `numbers: list[float]` to fix division by zero on empty input and provide proper Swagger examples. Module-level Pydantic models cannot be used in LB endpoints because function bodies are serialized to remote workers. * fix(05_load_balancer): fix gpu_lb endpoint typing and torch import - Use typed param (numbers: list[float]) instead of untyped dict for proper Swagger examples and input validation - Remove broad except Exception that swallowed ImportError as misleading device info; torch should be available on GPU worker images - Add dependencies=["torch"] for environments where torch needs explicit installation
diff --git a/01_getting_started/04_dependencies/README.md b/01_getting_started/04_dependencies/README.md
@@ -6,13 +6,25 @@ Learn how to manage Python packages and system dependencies in Flash workers.
 
 - **Python dependencies** - Installing packages with version constraints
 - **System dependencies** - Installing apt packages (ffmpeg, libgl1, etc.)
+- **GPU vs CPU packaging** - How dependencies are resolved differently per runtime
+- **Shared dependencies** - GPU and CPU endpoints using the same package (numpy)
 - **Version constraints** - Supported syntax for version pinning
 - **Dependency optimization** - Minimizing cold start time
 
 ## Quick Start
 
 **Prerequisites**: Complete the [repository setup](../../README.md#quick-start) first (clone, `make dev`, set API key).
 
+### Files
+
+| File | What it demonstrates |
+|------|---------------------|
+| `gpu_worker.py` | Python deps with version pins, system deps (ffmpeg, libgl1) |
+| `cpu_worker.py` | Data science deps on CPU (numpy, pandas, scipy), zero-dep worker |
+| `mixed_worker.py` | Same dependency (numpy) on both GPU and CPU endpoints |
+
+> **Note:** `gpu_worker.py` uses `GpuGroup` while newer snippets in this README use `GpuType`. Both enums are supported by the SDK; `GpuType` is recommended for new code.
+
 ### Run This Example
 
 ```bash
@@ -38,6 +50,26 @@ uv run flash login
 uv run flash run
 ```
 
+## GPU vs CPU Packaging
+
+GPU and CPU endpoints use different base Docker images, which affects how dependencies are resolved:
+
+| | GPU images (`runpod/pytorch:*`) | CPU images (`python:X.Y-slim`) |
+|---|---|---|
+| **Base image** | PyTorch + CUDA + numpy + triton | Python stdlib only |
+| **Pre-installed** | torch, torchvision, torchaudio, numpy, triton | Nothing |
+| **Build artifact** | Excludes torch ecosystem (too large for 500 MB tarball) | Includes everything declared in `dependencies` |
+
+**What this means for you:**
+
+- **GPU endpoints**: `torch`, `torchvision`, `torchaudio`, and `triton` are excluded from the build artifact because they already exist in the base image and would exceed the 500 MB tarball limit. All other dependencies (including `numpy`) are packaged normally.
+- **CPU endpoints**: Every dependency must be in the build artifact. Nothing is pre-installed.
+- **Mixed projects**: When GPU and CPU endpoints share a dependency like `numpy`, it ships in the tarball. The GPU image ignores the duplicate (its pre-installed copy takes precedence).
+
+See `mixed_worker.py` for a working example of GPU and CPU endpoints sharing `numpy`.
+
+**Safety net**: If a dependency is missing from the build artifact at runtime, the worker attempts to install it on-the-fly and logs a warning. This prevents crashes but adds to cold start time. Always declare your dependencies explicitly to avoid this penalty.
+
 ## Dependency Types
 
 ### 1. Python Dependencies
@@ -47,7 +79,7 @@ Specified in the `Endpoint` decorator:
 ```python
 @Endpoint(
     name="my-worker",
-    gpu=GpuGroup.ADA_24,
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
     dependencies=[
         "requests==2.32.3",  # Exact version
         "Pillow>=10.0.0",    # Minimum version
@@ -70,7 +102,7 @@ Install apt packages:
 ```python
 @Endpoint(
     name="my-worker",
-    gpu=GpuGroup.AMPERE_16,
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
     dependencies=["opencv-python"],
     system_dependencies=["ffmpeg", "libgl1", "graphviz"],
 )
@@ -251,7 +283,7 @@ python cpu_worker.py
 ```python
 @Endpoint(
     name="worker",
-    gpu=GpuGroup.ADA_24,
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
     dependencies=[
         "requests==2.32.3",  # API calls
         "Pillow>=10.0.0",    # Image processing
@@ -335,17 +367,36 @@ numpy
 
 **Note:** Worker dependencies in the `Endpoint` decorator are deployed automatically. `requirements.txt` is for local development only.
 
+## Build Exclusions
+
+Flash automatically excludes packages that are too large for the 500 MB build artifact limit. Currently excluded: `torch`, `torchvision`, `torchaudio`, `triton` (all CUDA-specific, pre-installed in GPU images).
+
+You can exclude additional large packages with `--exclude`:
+
+```bash
+# Exclude tensorflow from the build artifact
+flash build --exclude tensorflow
+```
+
+**Important:** Only exclude packages that are pre-installed in your target runtime. If you exclude a package that a CPU endpoint needs, the worker will attempt to install it on-the-fly at startup. This works but adds to cold start time and logs a warning:
+
+```
+WARNING - Package 'scipy' is not in the build artifact. Installing on-the-fly.
+This adds to cold start time -- consider adding it to your dependencies list
+to include it in the build artifact.
+```
+
 ## Advanced: External Docker Images
 
 For complex dependencies, deploy a pre-built image:
 
 ```python
-from runpod_flash import Endpoint, GpuGroup
+from runpod_flash import Endpoint, GpuType
 
 vllm = Endpoint(
     name="vllm-service",
     image="vllm/vllm-openai:latest",
-    gpu=GpuGroup.ADA_24,
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
 )
 
 # call it as an API client
diff --git a/01_getting_started/04_dependencies/mixed_worker.py b/01_getting_started/04_dependencies/mixed_worker.py
@@ -0,0 +1,89 @@
+# GPU and CPU workers sharing a common dependency (numpy).
+# Demonstrates that dependencies work correctly across both runtime environments:
+#   - GPU images (runpod/pytorch:*) have numpy pre-installed
+#   - CPU images (python-slim) install numpy from the build artifact
+#
+# run with: flash run
+# test directly: python mixed_worker.py
+from runpod_flash import CpuInstanceType, Endpoint, GpuType
+
+
+@Endpoint(
+    name="01_04_deps_gpu_numpy",
+    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
+    workers=(0, 3),
+    dependencies=["numpy"],
+)
+async def gpu_matrix_multiply(input_data: dict) -> dict:
+    """GPU-instance worker running CPU-bound numpy matrix operations.
+
+    This endpoint runs on a GPU instance type, but uses standard numpy,
+    so all computations execute on the CPU. On GPU images, numpy is
+    pre-installed in the base image; the build artifact also includes
+    it, so both paths work, with the image's copy taking precedence.
+    """
+    import numpy as np
+
+    size = min(max(int(input_data.get("size", 100)), 1), 10_000)
+    a = np.random.rand(size, size)
+    b = np.random.rand(size, size)
+    result = np.dot(a, b)
+
+    return {
+        "status": "success",
+        "worker_type": "GPU",
+        "matrix_size": size,
+        "result_shape": list(result.shape),
+        "result_trace": float(np.trace(result)),
+        "numpy_version": np.__version__,
+    }
+
+
+@Endpoint(
+    name="01_04_deps_cpu_numpy",
+    cpu=CpuInstanceType.CPU3C_1_2,
+    workers=(0, 3),
+    dependencies=["numpy"],
+)
+async def cpu_statistics(input_data: dict) -> dict:
+    """CPU worker using numpy for statistical computations.
+
+    On CPU images (python-slim), numpy is NOT pre-installed. The build
+    artifact must include it. Flash's build pipeline ships numpy in the
+    tarball for CPU endpoints.
+    """
+    import numpy as np
+
+    raw_values = input_data.get("values", [1.0, 2.0, 3.0, 4.0, 5.0])
+    if not isinstance(raw_values, list) or len(raw_values) > 100_000:
+        return {
+            "status": "error",
+            "message": "values must be a list with at most 100000 elements",
+        }
+    values = raw_values
+    arr = np.array(values)
+
+    return {
+        "status": "success",
+        "worker_type": "CPU",
+        "count": len(values),
+        "mean": float(np.mean(arr)),
+        "std": float(np.std(arr)),
+        "median": float(np.median(arr)),
+        "numpy_version": np.__version__,
+    }
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    async def test():
+        print("\n=== Testing GPU numpy (matrix multiply) ===")
+        gpu_result = await gpu_matrix_multiply({"size": 50})
+        print(f"Result: {gpu_result}\n")
+
+        print("=== Testing CPU numpy (statistics) ===")
+        cpu_result = await cpu_statistics({"values": [10, 20, 30, 40, 50]})
+        print(f"Result: {cpu_result}\n")
+
+    asyncio.run(test())
diff --git a/03_advanced_workers/05_load_balancer/gpu_lb.py b/03_advanced_workers/05_load_balancer/gpu_lb.py
@@ -7,6 +7,7 @@
     name="03_05_load_balancer_gpu",
     gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
     workers=(1, 3),
+    dependencies=["torch"],
 )
 
 
@@ -17,19 +18,23 @@ async def gpu_health() -> dict:
 
 
 @api.post("/compute")
-async def compute_intensive(request: dict) -> dict:
+async def compute_intensive(numbers: list[float]) -> dict:
     """Perform compute-intensive operation on GPU.
 
     Args:
-        request: Request dict with numbers to process
+        numbers: List of numbers to compute statistics on
 
     Returns:
         Computation results
     """
     import time
     from datetime import datetime, timezone
 
-    numbers = request.get("numbers", [])
+    if not numbers:
+        return {
+            "status": "error",
+            "message": "numbers list must not be empty",
+        }
     start_time = time.time()
 
     result = sum(x**2 for x in numbers)
@@ -54,21 +59,16 @@ async def compute_intensive(request: dict) -> dict:
 @api.get("/info")
 async def gpu_info() -> dict:
     """Get GPU availability information."""
-    try:
-        import torch
+    import torch
 
-        if torch.cuda.is_available():
-            info = {
-                "available": True,
-                "device": torch.cuda.get_device_name(0),
-                "count": torch.cuda.device_count(),
-            }
-        else:
-            info = {"available": False, "device": "No GPU", "count": 0}
-    except Exception as e:
-        info = {"available": False, "device": str(e), "count": 0}
+    if torch.cuda.is_available():
+        return {
+            "available": True,
+            "device": torch.cuda.get_device_name(0),
+            "count": torch.cuda.device_count(),
+        }
 
-    return info
+    return {"available": False, "device": "No GPU", "count": 0}
 
 
 if __name__ == "__main__":
@@ -82,8 +82,7 @@ async def test():
         print(f"   {result}\n")
 
         print("2. Compute intensive:")
-        request_data = {"numbers": [1, 2, 3, 4, 5]}
-        result = await compute_intensive(request_data)
+        result = await compute_intensive([1, 2, 3, 4, 5])
         print(f"   Sum of squares: {result['sum_of_squares']}")
         print(f"   Mean: {result['mean']}\n")