Skip to content

Conversation

DusanJovic-NOAA
Copy link
Collaborator

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR adds a simple tracing module and updates some sub components' nuopc drivers to produce a trace file which can be used to identify performance issues.
The tracing module is not built and used by default. It can be enabled by setting a build option `-DUFS_TRACING=ON'

Commit Message:

* UFSWM - Add tracing instrumentation 
  * CICE - Add tracing instrumentation
  * CMEPS - Add tracing instrumentation
  * FV3 - Add tracing instrumentation
  * MOM6 - Add tracing instrumentation
  * WW3 - Add tracing instrumentation

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Documentation:

  • This PR requires a documentation update, and the WM User's Guide has been updated based on the changes in this PR.
  • This PR requires a documentation update, and a WM issue has been opened to track the need for a documentation update; a person responsible for submitting the update has been assigned to the issue (link issue).
  • No documentation update is required for this PR (please explain).

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Sep 15, 2025

@DusanJovic-NOAA This tracing will always be "on"; did you think about making it optional? I think each component would then need to retrieve an attribute (? do_tracing or something) and then only if maintask && do_tracing would the calls into ufs_trace be made.

EDIT: I see now that the PR includes the information that you need to compile w/ UFS_TRACING=ON.

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA This tracing will always be "on"; did you think about making it optional? I think each component would then need to retrieve an attribute (? do_tracing or something) and then only if maintask && do_tracing would the calls into ufs_trace be made.

EDIT: I see now that the PR includes the information that you need to compile w/ UFS_TRACING=ON.

Yes. Tracing will not be always "on". By default, it's "off". You turn it on by setting -DUFS_TRACING=ON. It's a build time option, not a run time. It must be a built time, because when components are used outside the UFS, the tracing subroutine is not going to be available, all tracing calls must be ifdef-ed out.  

@gspetro-NOAA
Copy link
Collaborator

@DusanJovic-NOAA I see this PR is open, but the PR template is not fully filled out. Do you have an estimate of when it will be ready/what remains to be done before we schedule it?

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA I see this PR is open, but the PR template is not fully filled out. Do you have an estimate of when it will be ready/what remains to be done before we schedule it?

At this moment, only CICE and WW3 PRs have been approved. Once all submodule's PRs are approved, I will run a full test on Ursa, post the regression test log file and add label 'Ready for Commit Queue'

@gspetro-NOAA gspetro-NOAA moved this from Not Ready to Pre-testing required in PRs to Process Sep 18, 2025
@NickSzapiro-NOAA
Copy link
Collaborator

NickSzapiro-NOAA commented Sep 22, 2025

@DusanJovic-NOAA While your tracing module can report by component and MPI rank, it looks like only one "maintask" is tracing in each component and ufs_trace writes pid=1.

I remember some ESMF profiles where min/max timings varied quite a lot between processes in the same component. Does tracing 1 task/component add some uncertainty to component times?

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA While your tracing module can report by component and MPI rank, it looks like only one "maintask" is tracing in each component and ufs_trace writes pid=1.

I remember some ESMF profiles where min/max timings varied quite a lot between processes in the same component. Does tracing 1 task/component add some uncertainty to component times?

It's difficult to know in advance which task to choose for tracing, so I just assumed the first task (main task or root task) is a good representative. In many cases, it's the first task that does most I/O, and some components already have a local flag to identify it. We can not trace all tasks in any reasonably realistic model run with many thousands of tasks. Which one would you choose, instead of the first one? Are you suggesting that we add an argument to pass task rank to a tracing call? Or maybe trace more than one task per component, how many, where would that be specified and how?

@NickSzapiro-NOAA
Copy link
Collaborator

We can not trace all tasks in any reasonably realistic model run with many thousands of tasks

The trace file seemed lightweight if every task writes to its own file, but maybe that's not true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pre-testing required

Development

Successfully merging this pull request may close these issues.

Add simple tracing instrumentation

4 participants