Skip to content

Opinion on module-based publishing #5128

@Midnighter

Description

@Midnighter

In the last podcast episode, you were debating a bit, whether allowing to define output publishing from a module/process-level.

My clear opinion is that this is a bad idea:

  • It's a mix of concerns. It should not be a process' (function's) concern where its output is stored. You are giving it too much responsibility.
  • Additionally, allowing publishing to only be set at the workflow level ensures the modularity of processes is optimal. The same actually applies to sub-workflows in my mind. I would only consider the highest-level workflow publishing instructions and ignore publishing set in sub-workflows. That way, modules/sub-workflows can easily be used in different pipelines without requiring overrides.
  • In my view, processes should be as close as possible to pure functions, such that you have a reproducible output when given a deterministic environment (container hash), same input (hash of data), and same operations (hash of code/git commit). Changing publishing behavior requires code changes, even though the operation itself is not changing.
  • Flexibility in storage backends: If a module assumes certain output path capabilities that are not supported by my storage solution, I have to redefine all those publishing options.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions