Skip to content

ProcessBuilder: add get_schema method#7252

Open
GeigerJ2 wants to merge 15 commits intoaiidateam:mainfrom
GeigerJ2:feature/builder-get-schema
Open

ProcessBuilder: add get_schema method#7252
GeigerJ2 wants to merge 15 commits intoaiidateam:mainfrom
GeigerJ2:feature/builder-get-schema

Conversation

@GeigerJ2
Copy link
Collaborator

@GeigerJ2 GeigerJ2 commented Mar 3, 2026

This PR hopefully manages to eliminate many of the usability concerns we have received from users about working with builders.

Rendered docs for the new method:
https://aiida--7252.org.readthedocs.build/projects/aiida-core/en/7252/topics/processes/usage.html#inspecting-the-input-schema

It adds a get_schema() method to ProcessBuilderNamespace that returns a nice, YAML-formatted overview of all available inputs, so one can easily explore its structure, required inputs, types of the parameters, what has already been set, etc. Dogfooding on some real-world workchains very welcome @npaulish, @mbercx, @Minotakm =)

Motivation

(LLM-style headings, but I verified the whole text here, pinky promise)

When working with complex workflows (especially those with deeply nested namespaces like PwBandsWorkChain), it's difficult to quickly see:

  • What inputs are available
  • Which inputs are required
  • What types are expected
  • What has already been set on the builder

The new get_schema() method provides a clean, readable output that helps users understand the input structure at a glance.

Parameters

The following arguments are provided to customize what is being shown:

Parameter Type Default Description
mode 'compact' | 'verbose' 'compact' Output mode: 'compact' shows x: Int (required), 'verbose' includes type, help text, and other metadata
show 'all' | 'required' | 'set' 'all' Filter inputs: 'all' shows everything, 'required' only required, 'set' only what's been set
collapse tuple[str, ...] ('metadata',) Namespace names to collapse to {...}
max_depth int | None None Maximum nesting depth to display

Type names are displayed in a compact format (e.g., Int instead of <class 'aiida.orm.nodes.data.int.Int'>) for better readability. For types that accept multiple valid types, they are shown as Int | Float. Dynamic namespaces (e.g. monitors, pseudos) with no static children are displayed as Namespace(Type) in compact mode, and as a structured dict with type, entry_type, help, and required fields in verbose mode.

Notes on the implementation

get_schema is defined as a regular method on ProcessBuilderNamespace, not as a dynamic property on the per-instance subclass. Without a safeguard, a process that defines a port named get_schema would cause the dynamic subclass to create a property that shadows the inherited method (properties are data descriptors and take precedence in Python's MRO). To prevent this, _RESERVED_METHOD_NAMES is a class-level frozenset checked in __init__ — if a port name conflicts, a RuntimeError is raised at builder construction time. Only methods introduced by ProcessBuilderNamespace itself are reserved here; inherited MutableMapping methods (get, keys, values, items, ...) are intentionally excluded because existing processes already use some of these as port names (e.g. values in ExampleWorkChain).

The recursive schema building logic lives in _build_schema_dict, a private method that walks the PortNamespace tree. _format_valid_type is a @staticmethod that formats type annotations compactly, with deduplication to handle cases where different packages export classes with the same __name__ (e.g. aiida.orm.UpfData and aiida_pseudo.data.pseudo.upf.UpfData).

Type aliases SchemaMode and SchemaShow are defined at module level using Literal types, providing static type checking for the mode and show parameters.

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 94.31818% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.89%. Comparing base (c39cbfe) to head (545e3ef).

Files with missing lines Patch % Lines
src/aiida/engine/processes/builder.py 94.32% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7252      +/-   ##
==========================================
+ Coverage   79.85%   79.89%   +0.05%     
==========================================
  Files         566      566              
  Lines       43896    43980      +84     
==========================================
+ Hits        35047    35132      +85     
+ Misses       8849     8848       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GeigerJ2 GeigerJ2 marked this pull request as ready for review March 5, 2026 07:14
@GeigerJ2 GeigerJ2 changed the title Add get_schema method to ProcessBuilder ✨ Add get_schema method to ProcessBuilder Mar 5, 2026
Move `get_schema` from a function injected into the dynamic subclass to
a proper method on `ProcessBuilderNamespace`. This prevents port name
shadowing and enables IDE autocompletion and type checking.

Key changes:
- Add `_RESERVED_METHOD_NAMES` guard that raises `RuntimeError` if a
  port name conflicts with a reserved method (currently only
  `get_schema`)
- Extract `_format_valid_type` as a `@staticmethod` on the class
- Extract `_build_schema_dict` as a private recursive helper method
- Rename `format` parameter to `mode` to avoid shadowing the builtin
- Remove `format='keys'` mode (redundant with compact)
- Remove metadata-to-end reordering (collapsed by default anyway)
- Use `Literal` type aliases (`SchemaMode`, `SchemaShow`) instead of
  `str` with runtime validation
- Deduplicate type names in `_format_valid_type` (e.g. UpfData from
  different packages sharing the same `__name__`)
- Handle empty dynamic namespaces (e.g. monitors, pseudos) with
  `Namespace(Type)` in compact mode and full info dict in verbose mode
- Rewrite tests with precise YAML assertions instead of string checks
@GeigerJ2 GeigerJ2 requested a review from mbercx March 9, 2026 15:40
GeigerJ2 added 8 commits March 9, 2026 16:44
- Show actual default values in verbose mode instead of just
  `has_default: true`, with callable defaults displayed as `<callable>`
- Fix show='set' incorrectly showing collapsed namespaces that contain
  only empty sub-namespaces (e.g. stash, unstash) by recursing to check
  for actual user-set values
- Restore port name conflict check that was lost during merge
- Add clarifying comments for __dir__ override, dict.fromkeys
  deduplication, .value unwrapping, and empty namespace handling
@GeigerJ2 GeigerJ2 changed the title ✨ Add get_schema method to ProcessBuilder ProcessBuilder: add get_schema method Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant