Skip to content

Build DAG from functions directly #1268

Open
@zilto

Description

@zilto

Summary

Add first-class way to build a DAG from function objects (in contrast to DIY/ hacky). The user API could be either:

  • Builder().with_functions()
  • or an intermediary my_module = module_from_functions(fns*) then passed to .with_modules()

Goals:

  • simplify codebase
  • foundation for better hierarchical / nested graph structures (i.e., subdags)

Current

At the core of Hamilton, users:

  1. write functions in a Python module ("dataflow code")
  2. load that module in the "driver code" to build a DAG
  3. execute the DAG via the Driver

Problem

  • It is currently possible to build Hamilton DAGs from functions, but we have no official "here's how you do it" that we guarantee we'll support.
    • ad_hoc_utils means exactly the opposite
  • In a notebook, there's no good reason to create a module from a notebook function before passing to Hamilton (except our how constraints)
  • Python module machinery is complex and adds indirection to the codebase (tests, notebook extension, LSP)

Benefits

  • greatly simply many unit tests
  • facilitate marimo integration

Hamilton 2.0 / Broader perspective

There's no well-defined structure or purpose to Hamilton top-level modules (e.g., nodes, graph_types, graph_utils, graph, ad_hoc_utils, base, hamilton.common, models). I propose a structure that matches the Hamilton lifecycle:

  • hamilton.parser: everything that deals with source code: how functions are written, if type annotations are present (not type matching), collecting functions from modules, converting a notebook cell string to a module, remove comments and docstring before hashing source code
  • hamilton.compiler: converting code to DAG: structuring the DAG from functions, applying function modifiers, validating types, etc.
  • remove ad_hoc_utils

Metadata

Metadata

Assignees

No one assigned

    Labels

    core-workWork that is "core". Likely overseen by core team in most cases.questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions