Open
Description
Summary
Add first-class way to build a DAG from function objects (in contrast to DIY/ hacky). The user API could be either:
Builder().with_functions()
- or an intermediary
my_module = module_from_functions(fns*)
then passed to.with_modules()
Goals:
- simplify codebase
- foundation for better hierarchical / nested graph structures (i.e., subdags)
Current
At the core of Hamilton, users:
- write functions in a Python module ("dataflow code")
- load that module in the "driver code" to build a DAG
- execute the DAG via the Driver
Problem
- It is currently possible to build Hamilton DAGs from functions, but we have no official "here's how you do it" that we guarantee we'll support.
ad_hoc_utils
means exactly the opposite
- In a notebook, there's no good reason to create a module from a notebook function before passing to Hamilton (except our how constraints)
- Python module machinery is complex and adds indirection to the codebase (tests, notebook extension, LSP)
Benefits
- greatly simply many unit tests
- facilitate marimo integration
Hamilton 2.0 / Broader perspective
There's no well-defined structure or purpose to Hamilton top-level modules (e.g., nodes
, graph_types
, graph_utils
, graph
, ad_hoc_utils
, base
, hamilton.common
, models
). I propose a structure that matches the Hamilton lifecycle:
hamilton.parser
: everything that deals with source code: how functions are written, if type annotations are present (not type matching), collecting functions from modules, converting a notebook cell string to a module, remove comments and docstring before hashing source codehamilton.compiler
: converting code to DAG: structuring the DAG from functions, applying function modifiers, validating types, etc.- remove
ad_hoc_utils