Skip to content

Commit 3e82bad

Browse files
vijayvammiclaude
andauthored
feat: Revisit examples (#240)
* feat: add comprehensive machine learning tutorials Add three complete tutorial examples demonstrating Runnable's ML capabilities: **Data Science 101 Tutorial:** - End-to-end ML pipeline with data loading, exploration, preprocessing, training, and evaluation - Uses scikit-learn with RandomForestClassifier on Iris dataset - Demonstrates parameter passing, catalog management, and metrics tracking - Includes comprehensive visualizations and model evaluation **Model Comparison Tutorial:** - Parallel model training and comparison using Runnable's Parallel execution - Compares 4 ML algorithms: Random Forest, Logistic Regression, SVM, and KNN - Features cross-validation, hyperparameter tuning, and performance visualization - Demonstrates advanced pipeline orchestration and result aggregation **PyTorch Distributed Training Tutorial:** - Single-node distributed training using PyTorch DistributedDataParallel (DDP) - Multi-process coordination with gradient synchronization across 4 CPU cores - Comprehensive checkpoint management with per-epoch, latest, and final saves - **Process output capture**: All prints from distributed processes captured - Advanced features: ProcessOutputCapture manager, TeeOutput for logging - Complete training visibility with process-specific output files **Key Features Added:** - Tutorial dependency group with scikit-learn, matplotlib, seaborn, torch - Comprehensive README documentation for each tutorial - Production-ready code with proper error handling and logging - Runnable catalog integration for artifact storage and reproducibility - Advanced distributed training patterns with output capture These tutorials serve as comprehensive examples for ML practitioners using Runnable for data science workflows, model comparison, and distributed learning. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat: add standard PyTorch examples with minimal runnable integration Add comprehensive examples showing how runnable can execute standard PyTorch code with only type annotations required. Demonstrates the minimal changes needed to integrate existing PyTorch scripts with runnable orchestration. Features: - Standard PyTorch training scripts with argparse patterns - Distributed training using PyTorch DDP - Type-annotated functions for runnable compatibility - Clean integration via PythonJob wrappers - YAML parameter configuration - Comprehensive documentation showing migration path Key insight: 99% of existing PyTorch code remains unchanged, only requiring type annotations to enable runnable's orchestration capabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix: bug in parameter when handling objects * docs: Better examples and docs * feat: add conditional parallel execution for local executors - Add enable_parallel config option to local and local-container executors - Implement parallel execution using multiprocessing.Pool for parallel and map nodes - Add supports_parallel_writes flag to run log stores (True for chunked-fs, False for file-system) - Graceful fallback to sequential execution with warning when parallel writes not supported - Only enable parallel execution for local executors with user opt-in via config - Add execute_single_branch function for multiprocessing execution - Maintain backward compatibility with existing sequential behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat: output capture * docs: improve code examples and fix documentation patterns - Update mocking-testing.md with executable examples following main() - Fix jobs-vs-pipelines.md by removing non-existent .as_pipeline() - Enhance reproducibility.md with custom run ID examples - Improve file-storage.md by removing duplicate sections - Update first-job.md with comprehensive custom run ID docs - Streamline jobs/index.md by removing premature error handling - Enhance job-types.md focus and remove combining jobs section - Expand parameters.md with argparse migration examples - Add pipeline-parameters.md for pipeline-specific parameters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: add comprehensive parallel execution support documentation - Update executor overview with conditional parallel support - Enhance local.md with detailed parallel execution examples - Add parallel execution examples to local-container.md - Improve run-log.md with parallel execution compatibility table - Update parallel-execution.md with local configuration requirements - Document enable_parallel config option and chunked-fs requirement - Add automatic fallback behavior and compatibility information - Include practical examples with configuration files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: enhance job execution documentation for consistency and accuracy Update all job execution documentation following established patterns: - Fix code examples to use main() function pattern consistently - Update command usage from `python` to `uv run` throughout - Correct execute() parameter from config= to configuration_file= - Add comprehensive configuration references based on actual classes - Improve user experience with clear benefits, trade-offs, and upgrade paths - Streamline troubleshooting sections with actionable guidance - Enhance overview with better executor comparison and selection guidance All job executors now maintain consistency with pipeline executor patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: Extensions captured * docs: Extensions captured * docs: Extensions captured * feat(tutorial): add getting started tutorial navigation structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tutorial): add core ML functions for getting started tutorial Add foundational ML functions for the getting started tutorial: - Complete ML workflow functions (load, preprocess, train, evaluate) - Sample dataset generation - Model and results persistence - Basic monolithic training function demonstrating common problems This establishes the baseline "before Runnable" code that will be progressively enhanced throughout the tutorial chapters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tutorial): add Chapter 1 - The Starting Point 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tutorial): add Chapter 2 - Making It Reproducible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tutorial): add Chapter 3 - Adding Flexibility Add parameterized ML functions with flexible configuration support. Users can now run different experiments without code changes using environment variables or YAML config files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tutorial): add Chapter 4 - Connecting the Workflow Transform monolithic ML function into multi-step pipeline with automatic data flow between steps. Shows how functions can be composed into pipelines with step-by-step tracking and intermediate result preservation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: add tutorial chapter 5 - handling large datasets Add chapter demonstrating efficient file-based data management using Catalog. Shows how to handle datasets larger than memory by storing intermediate results as files instead of passing everything through memory. Key concepts: - Using Catalog(put=[...]) for storing files - Using Catalog(get=[...]) for retrieving files - File-based data flow for large datasets - Mixing file storage with memory passing Example script and documentation both tested successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: add tutorial chapter 6 - sharing results Add chapter demonstrating persistent storage of model artifacts and metrics that can be shared across runs and team members. Key concepts: - Storing model artifacts in catalog - Using metric() for tracking performance metrics - Loading previously saved models - Metrics tracked in run logs - Performance history and comparison Shows how to make results persistent beyond pipeline execution, enabling model reuse and performance tracking over time. Example script and documentation both tested successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: add tutorial chapter 7 - running anywhere Add final chapter demonstrating that the same pipeline code runs in different environments without modification. Environment controlled by configuration, not code changes. Key concepts: - Same code for local, container, and cloud execution - Configuration-driven deployment - Develop locally, deploy anywhere workflow - Zero code changes between environments - Production-ready portability Completes the getting-started tutorial showing the full journey from a simple ML function to a production-ready portable pipeline. Example script and documentation both tested successfully. All chapters 1-7 verified to run without errors. Documentation builds successfully with mkdocs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: Extensions captured --------- Co-authored-by: Claude <[email protected]>
1 parent 15de039 commit 3e82bad

File tree

151 files changed

+17349
-8987
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

151 files changed

+17349
-8987
lines changed

.claude/commands/docster.md

Lines changed: 841 additions & 0 deletions
Large diffs are not rendered by default.

.claude/commands/tutor.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Tutorial development
2+
3+
You are helping with tutorial of the Runnable framework.
4+
5+
# Working examples
6+
7+
There are plenty of examples in examples folder with the following structure.
8+
They are layered as per increasing complexity. Focus only on .py files as yaml is being deprecated.
9+
10+
You can run any example as : ```uv run <python_file_name>```.
11+
The resulting run is named by ```run_id```.
12+
13+
Any execution results in:
14+
15+
- a run log captured in .run_log_store against that run id
16+
- a catalog folder against the run_id which has the files moved between tasks. It also captures the output from the
17+
function or script execution. In case of the notebook, the output notebook is stored.
18+
19+
├── 01-tasks - tells you how to run python functions, notebooks or scripts as pipelines
20+
│   ├── notebook.py
21+
│   ├── notebook.yaml
22+
│   ├── python_task_as_pipeline.py
23+
│   ├── python_tasks.py
24+
│   ├── python_tasks.yaml
25+
│   ├── scripts.py
26+
│   ├── scripts.yaml
27+
│   ├── stub.py
28+
│   └── stub.yaml
29+
├── 02-sequential - tell you how to stitch tasks into pipelines.
30+
│   ├── conditional.py
31+
│   ├── default_fail.py
32+
│   ├── default_fail.yaml
33+
│   ├── on_failure_fail.py
34+
│   ├── on_failure_fail.yaml
35+
│   ├── on_failure_succeed.py
36+
│   ├── on_failure_succeed.yaml
37+
│   ├── traversal.py
38+
│   └── traversal.yaml
39+
├── 03-parameters - shows the parameter flow between tasks and setting initial parameters.
40+
Focus on how parameters are accessed and returned back. They are by names or argspace or kwargs.
41+
│   ├── passing_parameters_notebook.py
42+
│   ├── passing_parameters_notebook.yaml
43+
│   ├── passing_parameters_python.py
44+
│   ├── passing_parameters_python.yaml
45+
│   ├── passing_parameters_shell.py
46+
│   ├── passing_parameters_shell.yaml
47+
│   ├── static_parameters_fail.py
48+
│   ├── static_parameters_fail.yaml
49+
│   ├── static_parameters_non_python.py
50+
│   ├── static_parameters_non_python.yaml
51+
│   ├── static_parameters_python.py
52+
│   └── static_parameters_python.yaml
53+
├── 04-catalog - Shows how to flow files between tasks. Focus on how get/put works and also how the user can chose not
54+
to store a copy in case if the file is too big.
55+
│   ├── catalog_no_copy.py
56+
│   ├── catalog_on_fail.py
57+
│   ├── catalog_on_fail.yaml
58+
│   ├── catalog_python.py
59+
│   ├── catalog_python.yaml
60+
│   └── catalog.py
61+
├── 06-parallel - shows how to run parallel branches
62+
│   ├── nesting.py
63+
│   ├── nesting.yaml
64+
│   ├── parallel_branch_fail.py
65+
│   ├── parallel_branch_fail.yaml
66+
│   ├── parallel.py
67+
│   └── parallel.yaml
68+
├── 07-map - shows how to run a branch looped over an iterable.
69+
│   ├── custom_reducer.py
70+
│   ├── custom_reducer.yaml
71+
│   ├── map_fail.py
72+
│   ├── map_fail.yaml
73+
│   ├── map.py
74+
│   └── map.yaml
75+
├── 08-mocking - Useful for mocking/testing parts of the workflow.
76+
│   ├── default.yaml
77+
│   ├── mocked_map_parameters.yaml
78+
│   ├── mocked-config-debug.yaml
79+
│   ├── mocked-config-simple.yaml
80+
│   ├── mocked-config-unittest.yaml
81+
│   ├── mocked-config.yaml
82+
│   └── patching.yaml
83+
├── 11-jobs - shows how to run jobs.
84+
│   ├── catalog_no_copy.py
85+
│   ├── catalog.py
86+
│   ├── emulate.yaml
87+
│   ├── k8s-job.yaml
88+
│   ├── local-container.yaml
89+
│   ├── mini-k8s-job.yaml
90+
│   ├── notebooks.py
91+
│   ├── passing_parameters_python.py
92+
│   ├── python_tasks.py
93+
│   └── scripts.py
94+
95+
# Your role
96+
97+
Your role is to understand the current show case of capabilities and come up with missing examples.
98+
99+
You also need to help me with writing tutorials based on common ML workflows. There are some examples given in
100+
examples/tutorials but it can be improved.
101+
102+
The same applies to examples provided in torch folder. They should be improved to make it easier to understand.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,3 +163,6 @@ minikube/
163163
*_timeline.html
164164
*_dashboard.html
165165
*_diagram.svg
166+
167+
# Test Dockerfile
168+
Dockerfile.test

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ The docs explain the contextual example first and then show a detailed working e
179179

180180
When writing docs always use code from examples directory and always use code snippets to avoid duplication
181181

182-
Remember that when writing lists in md, there should be an empty line between the list - and the preceding line
182+
Remember that when writing lists in md, there should be an empty line between the list and the preceding line. This applies to all lists, including those following headings, text, or other elements
183183

184184

185185
I prefer to give prompts in a visual editor and I have my prompts in a file called prompt.md.

Dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Test Dockerfile with all runtime dependencies
2+
# Apple M1 compatible multi-platform image
3+
4+
FROM python:3.11-slim
5+
6+
# Set working directory
7+
WORKDIR /app
8+
9+
USER root
10+
11+
# Install system dependencies
12+
RUN apt-get update && apt-get install -y \
13+
git \
14+
curl \
15+
build-essential \
16+
&& rm -rf /var/lib/apt/lists/*
17+
18+
# Install uv for fast dependency management
19+
RUN pip install uv
20+
21+
# Copy project files
22+
COPY pyproject.toml uv.lock README.md ./
23+
RUN uv sync --all-extras --frozen --all-groups
24+
25+
COPY runnable/ ./runnable/
26+
COPY extensions/ ./extensions/
27+
COPY examples/ ./examples/
28+
29+
# Set environment variables
30+
ENV PYTHONPATH=/app
31+
ENV PATH="/app/.venv/bin:$PATH"

README.md

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
**Transform any Python function into a portable, trackable pipeline in seconds.**
66

77
<p align="center">
8-
<a href="https://pypi.org/project/runnable/"><img alt="python:" src="https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue.svg"></a>
8+
<a href="https://pypi.org/project/runnable/"><img alt="python:" src="https://img.shields.io/badge/python-3.10+-blue.svg"></a>
99
<a href="https://pypi.org/project/runnable/"><img alt="Pypi" src="https://badge.fury.io/py/runnable.svg"></a>
1010
<a href="https://github.com/AstraZeneca/runnable/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg"></a>
1111
<a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
@@ -26,11 +26,16 @@ def analyze_sales():
2626
return total_revenue, best_product
2727
```
2828

29-
**Make it runnable everywhere (2 lines):**
29+
**Make it runnable everywhere:**
3030

3131
```python
3232
from runnable import PythonJob
33-
PythonJob(function=analyze_sales).execute()
33+
34+
def main():
35+
PythonJob(function=analyze_sales).execute()
36+
37+
if __name__ == "__main__":
38+
main()
3439
```
3540

3641
**🎉 Success!** Your function now runs the same on laptop, containers, and Kubernetes with automatic tracking and reproducibility.
@@ -46,10 +51,15 @@ def analyze_segments(customer_data): # Name matches = automatic connection
4651

4752
# What Runnable needs (same logic, no glue):
4853
from runnable import Pipeline, PythonTask
49-
Pipeline(steps=[
50-
PythonTask(function=load_customer_data, returns=["customer_data"]),
51-
PythonTask(function=analyze_segments, returns=["analysis"])
52-
]).execute()
54+
55+
def main():
56+
Pipeline(steps=[
57+
PythonTask(function=load_customer_data, returns=["customer_data"]),
58+
PythonTask(function=analyze_segments, returns=["analysis"])
59+
]).execute()
60+
61+
if __name__ == "__main__":
62+
main()
5363
```
5464

5565
**Same pipeline runs unchanged on laptop, containers, and Kubernetes.**
@@ -60,6 +70,16 @@ Pipeline(steps=[
6070
pip install runnable
6171
```
6272

73+
**For development:**
74+
```bash
75+
uv sync --all-extras --dev
76+
```
77+
78+
**Run examples:**
79+
```bash
80+
uv run examples/01-tasks/python_tasks.py
81+
```
82+
6383
## 📊 Why Choose Runnable?
6484

6585
- **🎯 Easy to adopt**: Your code remains as-is, no decorators or imposed structure

data_folder/data.txt

Lines changed: 0 additions & 1 deletion
This file was deleted.

df.csv

Lines changed: 0 additions & 4 deletions
This file was deleted.

docs/concepts/advanced-patterns/conditional-workflows.md renamed to docs/advanced-patterns/conditional-workflows.md

Lines changed: 56 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -17,28 +17,33 @@ flowchart TD
1717
```python
1818
from runnable import Conditional, Pipeline, PythonTask, Stub
1919

20-
# Step 1: Make a decision
21-
toss_task = PythonTask(
22-
function=toss_function, # Returns "heads" or "tails"
23-
returns=["toss"], # Named return for conditional to use
24-
name="toss_task"
25-
)
26-
27-
# Step 2: Branch based on decision
28-
conditional = Conditional(
29-
parameter="toss", # Use the "toss" value from above
30-
branches={
31-
"heads": heads_pipeline, # Run this if toss="heads"
32-
"tails": tails_pipeline # Run this if toss="tails"
33-
},
34-
name="conditional"
35-
)
36-
37-
# Step 3: Continue after branching
38-
continue_step = Stub(name="continue_processing")
39-
40-
pipeline = Pipeline(steps=[toss_task, conditional, continue_step])
41-
pipeline.execute()
20+
def main():
21+
# Step 1: Make a decision
22+
toss_task = PythonTask(
23+
function=toss_function, # Returns "heads" or "tails"
24+
returns=["toss"], # Named return for conditional to use
25+
name="toss_task"
26+
)
27+
28+
# Step 2: Branch based on decision
29+
conditional = Conditional(
30+
parameter="toss", # Use the "toss" value from above
31+
branches={
32+
"heads": create_heads_pipeline(), # Run this if toss="heads"
33+
"tails": create_tails_pipeline() # Run this if toss="tails"
34+
},
35+
name="conditional"
36+
)
37+
38+
# Step 3: Continue after branching
39+
continue_step = Stub(name="continue_processing")
40+
41+
pipeline = Pipeline(steps=[toss_task, conditional, continue_step])
42+
pipeline.execute()
43+
return pipeline
44+
45+
if __name__ == "__main__":
46+
main()
4247
```
4348

4449
??? example "See complete runnable code"
@@ -60,6 +65,7 @@ pipeline.execute()
6065

6166
## The decision function
6267

68+
**Helper function (makes the decision):**
6369
```python
6470
import random
6571

@@ -74,6 +80,7 @@ Returns `"heads"` or `"tails"` - the conditional uses this to pick a branch.
7480

7581
## Branch pipelines
7682

83+
**Helper functions (create the branch pipelines):**
7784
```python
7885
def create_heads_pipeline():
7986
return PythonTask(
@@ -106,35 +113,41 @@ flowchart TD
106113

107114
**Data validation:**
108115
```python
109-
# Check data quality, route accordingly
110-
parameter="data_quality" # returns "good", "needs_cleaning", "invalid"
111-
branches={
112-
"good": analysis_pipeline,
113-
"needs_cleaning": cleanup_then_analysis_pipeline,
114-
"invalid": error_handling_pipeline
115-
}
116+
# Example conditional configuration (partial code)
117+
conditional = Conditional(
118+
parameter="data_quality", # returns "good", "needs_cleaning", "invalid"
119+
branches={
120+
"good": analysis_pipeline,
121+
"needs_cleaning": cleanup_then_analysis_pipeline,
122+
"invalid": error_handling_pipeline
123+
}
124+
)
116125
```
117126

118127
**Model selection:**
119128
```python
120-
# Choose model based on data size
121-
parameter="dataset_size" # returns "small", "medium", "large"
122-
branches={
123-
"small": simple_model_pipeline,
124-
"medium": ensemble_pipeline,
125-
"large": distributed_training_pipeline
126-
}
129+
# Example conditional configuration (partial code)
130+
conditional = Conditional(
131+
parameter="dataset_size", # returns "small", "medium", "large"
132+
branches={
133+
"small": simple_model_pipeline,
134+
"medium": ensemble_pipeline,
135+
"large": distributed_training_pipeline
136+
}
137+
)
127138
```
128139

129140
**Environment routing:**
130141
```python
131-
# Different behavior per environment
132-
parameter="environment" # returns "dev", "staging", "prod"
133-
branches={
134-
"dev": fast_testing_pipeline,
135-
"staging": full_validation_pipeline,
136-
"prod": production_pipeline
137-
}
142+
# Example conditional configuration (partial code)
143+
conditional = Conditional(
144+
parameter="environment", # returns "dev", "staging", "prod"
145+
branches={
146+
"dev": fast_testing_pipeline,
147+
"staging": full_validation_pipeline,
148+
"prod": production_pipeline
149+
}
150+
)
138151
```
139152

140153
!!! tip "Conditional tips"

0 commit comments

Comments
 (0)