Skip to content

bug-fix: perfect_model_obs prints #916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

bug-fix: perfect_model_obs prints #916

wants to merge 4 commits into from

Conversation

mjs2369
Copy link
Contributor

@mjs2369 mjs2369 commented Jul 10, 2025

Description:

Adds a conditional to the print statements in pmo "Processing observation X of Y" so that the error_handler is called with E_ALLMSG if pmo is being run on more than 1 mpi proc

This fixes the issue that was occurring where it was only printing the the obs on task 0, which made it appear that only those obs were being processed

Fixes issue

#905

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Please describe any tests you ran to verify your changes.

Ran pmo for lorenz models and MOM6 with various numbers of MPI procs and values for print_every_nth_obs, ensured prints were correct

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

…s called with E_ALLMSG if pmo is being run on more than 1 mpi proc
@mjs2369
Copy link
Contributor Author

mjs2369 commented Jul 10, 2025

A few extra notes:

  1. The observations are out of order while being printed (especially when using lots of mpi procs) - currently the prints look like this with MOM6 on 256 procs
image

I could force the pmo prints to go in order, but I would have to sync across tasks, so I don't think we want to go for that...

  1. The print statements are slightly mismatched in terms of how many spaces there are on task 0 vs all the other tasks. Here's an example with lorenz_63
image

We could fix this by adding the same number of spaces to line 780 as there are in 782 here in the error_handler subroutine

if (task_number == 0) then
write(taskstr, '(a)' ) "PE 0: "
else
write(taskstr, '(a,i5,a)' ) "PE ", task_number, ": "

i.e.

   if (task_number == 0) then
      write(taskstr, '(a)' ) "PE     0: "

@mjs2369 mjs2369 requested a review from hkershaw-brown July 10, 2025 20:36
@hkershaw-brown
Copy link
Member

A few extra notes:

  1. The observations are out of order while being printed (especially when using lots of mpi procs) - currently the prints look like this with MOM6 on 256 procs
    I could force the pmo prints to go in order, but I would have to sync across tasks, so I don't think we want to go for that...

Good call, no need waste core-hours syncing.

  1. The print statements are slightly mismatched in terms of how many spaces there are on task 0 vs all the other tasks. Here's an example with lorenz_63

We could fix this by adding the same number of spaces to line 780 as there are in 782 here in the error_handler subroutine

if (task_number == 0) then
write(taskstr, '(a)' ) "PE 0: "
else
write(taskstr, '(a,i5,a)' ) "PE ", task_number, ": "

i.e.

   if (task_number == 0) then
      write(taskstr, '(a)' ) "PE     0: "

Gut feeling is to leave this alone so the PE 0: prints stand out.

I think most of the time people will set either -1 (to not have printouts) or some n comparable to the total obs, e.g. for 50,000 obs print every 5000 obs. Its a sanity check to see that PMO is doing something, or where it got to when a job ran of wall clock.

Comment on lines 497 to 505
if(task_count() == 1) then
write(msgstring, '(A,1x,I8,1x,A,I8)') 'Processing observation ', j, &
' of ', num_obs_in_set
call trace_message(msgstring, 'perfect_model_obs:', -1)
else
write(msgstring, '(A,1x,I8,1x,A,I8)') 'Processing observation ', global_obs_num, &
' of ', num_obs_in_set
call error_handler(E_ALLMSG, 'perfect_model_obs:' ,trim(msgstring))
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat.

I was wondering if you could not have the if statement and use E_ALLMSG for both cases since j==global_obs_num for task_count()==1

But I have not looked through why trace_message is used rather than error_handler(E_MSG
Is the -1 to stop 'silence=.true.' from not printing the message?

Side note, there is bunch of replicated code "trace_message", "set_trace" between pmo and filter and other modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if you could not have the if statement and use E_ALLMSG for both cases since j==global_obs_num for task_count()==1

But I have not looked through why trace_message is used rather than error_handler(E_MSG Is the -1 to stop 'silence=.true.' from not printing the message?

That's a good point. I think it would be better to use E_ALLMSG in both cases. I'll make this change.

I was also wondering about the general need for trace_message since we have the error_handler. It seems that trace_message and error_handler are basically doing the same thing in a slightly different way. I think we could consider switching out the places that use trace_message to use the error_handler alltogether.

But like you said, trace_message is replicated in pmo and filter_mod.f90 (and models/coamps_nest/coamps_util_mod.f90 but I don't think this is seeing a lot of use). So at the very least I think we should move trace_message to utilities_mod. Should I go ahead and make an issue about this and its overlap with the error_handler?

Finally, the -1 in call trace_message(msgstring, 'perfect_model_obs:', -1) is simply to force the message to print even if silence is set to true in the namelist

Copy link
Member

@hkershaw-brown hkershaw-brown Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I go ahead and make an issue about this and its overlap with the error_handler?

Yup make an issue. I wouldn't go ahead any make an code changes with the various trace message calls yet, let's collect all the various trace message stuff, e.g. there is a totally unused routine progress

subroutine progress(msg, when, dotime, label, threshold, sync) ! trace plus timestamp
that seems like another trace/timestamp message code.

@hkershaw-brown
Copy link
Member

The window printing is broken too:

#905 (comment)

mjs2369 added 2 commits July 16, 2025 12:22
…rints: using global_obs_num regardless of number of tasks since j==global_obs_num for task_count()==1. NOTE: now that we are not using trace_message, these prints will NOT execute anymore if the namelist option silence is set to true
@hkershaw-brown
Copy link
Member

nice double bug fix Marlee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants