-
Notifications
You must be signed in to change notification settings - Fork 271
Add tracing instrumentation #2884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add tracing instrumentation #2884
Conversation
@DusanJovic-NOAA This tracing will always be "on"; did you think about making it optional? I think each component would then need to retrieve an attribute (? EDIT: I see now that the PR includes the information that you need to compile w/ UFS_TRACING=ON. |
Yes. Tracing will not be always "on". By default, it's "off". You turn it on by setting -DUFS_TRACING=ON. It's a build time option, not a run time. It must be a built time, because when components are used outside the UFS, the tracing subroutine is not going to be available, all tracing calls must be ifdef-ed out. |
@DusanJovic-NOAA I see this PR is open, but the PR template is not fully filled out. Do you have an estimate of when it will be ready/what remains to be done before we schedule it? |
At this moment, only CICE and WW3 PRs have been approved. Once all submodule's PRs are approved, I will run a full test on Ursa, post the regression test log file and add label 'Ready for Commit Queue' |
@DusanJovic-NOAA While your tracing module can report by component and MPI rank, it looks like only one "maintask" is tracing in each component and I remember some ESMF profiles where min/max timings varied quite a lot between processes in the same component. Does tracing 1 task/component add some uncertainty to component times? |
It's difficult to know in advance which task to choose for tracing, so I just assumed the first task (main task or root task) is a good representative. In many cases, it's the first task that does most I/O, and some components already have a local flag to identify it. We can not trace all tasks in any reasonably realistic model run with many thousands of tasks. Which one would you choose, instead of the first one? Are you suggesting that we add an argument to pass task rank to a tracing call? Or maybe trace more than one task per component, how many, where would that be specified and how? |
The trace file seemed lightweight if every task writes to its own file, but maybe that's not true |
Commit Queue Requirements:
Description:
This PR adds a simple tracing module and updates some sub components' nuopc drivers to produce a trace file which can be used to identify performance issues.
The tracing module is not built and used by default. It can be enabled by setting a build option `-DUFS_TRACING=ON'
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Documentation:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: