Skip to content

Documentation: include more visible information for running high-res / large state assimilations #999

@mjs2369

Description

@mjs2369

What's the issue?

📚 Please describe the problem you’ve noticed in the documentation.

The documentation needs more information for running high-res / large model size assimilations, and it needs to be more visible in the docs.

The buffer_state_io option can be found in the docs in the "IO - reading and writing of the model state" section of the "Data management in DART" page, but this page is deep into the documentation, and I feel many users do not end up reading. https://docs.dart.ucar.edu/en/latest/guide/data-management-issues.html#data-management-in-dart

This nml option greatly improves the performance of filter with large states, especially when the state vector does not fit into memory on a single node.

Additionally, nowhere in the documentation does it talk about how running on scratch will astronomically improve the IO speed on Derecho. For reference, in a ROMS_Rutgers run with ~650 million elements in the state, running on scratch causes the state space output to be completed in a fraction of the time.

12 nodes and 300 mpiprocs on work (30 mins):

Before state space output TIME: 2025/10/30 09:36:00
After  state space output TIME: 2025/10/30 10:06:10

vs 12 nodes and 300 mpiprocs on scratch (2 mins):

Before state space output TIME: 2025/11/06 23:24:23
After  state space output TIME: 2025/11/06 23:26:20

This is due to the fact that scratch uses a Lustre file system, which is a different file system from both work and home (https://ncar-hpc-docs-arc-iframe.readthedocs.io/storage-systems/glade/lustre/).

Finally, the options in the &ensemble_manager_nml (namely layout and tasks_per_node) could also be more visible. They are detailed in both the "Data management in DART" and "MODULE ensemble_manager_mod" doc pages, but I think it should also be included in with this new section of the documentation.

What needs to be fixed?

Share what’s incorrect, unclear, missing, or outdated.

Information on these performance enhancing tactics is either missing or very difficult to find in the documentation

Suggestions for improvement

If you have any ideas to fix or generally improve this documentation, please share them.

This information could be added as a small section to the quickstart guide, which would both make the information more visible and also promote our capabilities to run with large states.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions