Skip to content

Per-dataset graph configuration #822

@JPXKQX

Description

@JPXKQX

Is your feature request related to a problem? Please describe.

When training with multiple datasets, it is currently not possible to configure different graphs per dataset through the standard configuration workflow. The encoder/decoder subgraph configuration is global, which limits flexibility when datasets require different graph structures.

As a workaround, graphs can be pre-generated externally and loaded during training, but this introduces additional complexity.

Describe the solution you'd like

Add native support for per-dataset encoder and decoder subgraphs configuration in multi-dataset training.

In addition, refactor the current workflow so that the training pipeline constructs and operates on a single graph instance (HeteroData) rather than a dictionary of graphs per dataset. The per-dataset configuration should be resolved during dataset initialisation, producing a unified graph representation compatible with the existing model and training interfaces.

Additional context

This functionality is important when combining datasets with different spatial structures or connectivity requirements (e.g. stretched/LAM with regional observations)

Organisation

ECMWF

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    To be triaged

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions