Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCAR to main (2023-09-14) #1610

Merged

Conversation

gustavo-marques
Copy link
Collaborator

This PR includes the following:

Features


New parameters

  • USE_LEITHY
    • LEITH_CK
  • NDIFF_TAPERING
  • KHTR_USE_EBT_STRUCT
  • N_SMOOTH_RI
  • DO_BL_RESIDENCE
  • USE_REAL_BL_DEPTH
  • FPMIX

Bug fixes & documentation improvements


Code improvements and cleanup


Obsolete MCT cap

Contributors

gustavo-marques and others added 30 commits April 27, 2022 14:25
Replaces LU_pred to L_diag, since now this logical only controls
if diagnostics should be posted.
This commit adds the latest updates to the vertFPmix subroutine
after Bill Large did some cleaning. We have highlight places in
the code where work must be done.
omega_w2x is the counter-clockwise angle of the wind stress with
respect to the horizontal abscissa (x-coordinate) at tracer
points [rad]. This variable is needed in the vertPFmix subroutine.
This line of code was lost during the last merge.
* Rename MOM_lateral_boundary_diffusion.F90 to
  MOM_hor_bnd_diffusion.F90.
* Following the suggestion from a reviewer, MOM_lateral_boundary_diffusion
  has been renamed to MOM_hor_bnd_diffusion. Many submodules related to the
  'old; lateral diffusion have been renamed throughout the code.
  LBD has been replaced to HBD.
* Tested that answers for GMOM do not change.
Build and store the HBD grid outside the tracer loop since the
same grid is used in all tracers. This makes this module
more computationaly efficient. A GMOM case run for 10 days and with
3 tracer is ~ 7.5 % faster.
add desc argument to log_param calls in MOM_CFC_cap
CFC_BC_FILE must be specified if USE_CFC_CAP=.true.

use hemispheric averages poleward of 10 degrees latitude
linearly interpolate between 10S and 10N

correct bug that atm cfc12 was used in cfc11 flux computation
* Add subroutine hbd_grid_test, which mimics subroutine hbd_grid
  but it is only used in the unit tests;

* Add unit tests for hbd_grid_test and fix existing tests for
  fluxes_layer_method;

* Delete unused code and fix the format of doxygen throughout
  the module.
…OM_CFC_cap

refs moved out of nuopc cap code, MOM_forcing_type, MOM_variables

call CFC_cap_set_forcing in call_tracer_set_forcing

add call to call_tracer_set_forcing in nuopc cap

add arguments to call_tracer_set_forcing

increase width in MOM_CFC_cap unit test output

correct typo in oil_tracer
…of_cap

migrate nearly all refs to CFC_cap into MOM_tracer_flow_control and MOM_CFC_cap
Improvements in the MOM_lateral_boundary_diffusion module
* changes in nuopc cap, infra, and MOM.F90 to receive ensembe id from the coupler (alternative to FMS ensemble mngr)

* multi-instance logfile name correction in nuopc cap

* append ensemble suffix to _doc files

* changes in rpointer and restart file name handling to accommodate multi-instance CESM runs

* remove fms2_io_mod usage in FMS1/MOM_ensemble_manager_infra.F90

* rm whitespace in mom_cap
This commit adds the option to apply a linear decay in the
neutral diffusion fluxes within a transition zone defined
by the boundary layer depths of adjacent columns. This option
is controlled by a new parameter NDIFF_TAPERING, which is only
available when NDIFF_INTERIOR_ONLY=True. By default
NDIFF_TAPERING=False and answers are bitwise identical.
…0406

Merge changes from GFDL to main (2023-04-06)
Writes useful fields when the diffusivity of viscosity is less than zero.
The should help understanding the root cause of such cases and facilitate
the necessary adjustments.
…ngmuir_kpp

Output relevant fields when diff or visc < 0
Simplifies and reduces the code by adding hbd to the neutral
diffusion contril structure. This avoid the need to "extract"
hbl multiple times. Answers are bitwise indenticals.
@kshedstrom
Copy link
Collaborator

Since it runs in debug mode, I'm willing to chalk it up as a compiler issue. All the ESMG tests run.

I don't have other compilers to try on "new chinook" with my current software versions. I'll approve assuming that line will get taken out.

@jiandewang
Copy link
Collaborator

It worked fine on R&D machine (b4b results) but I am seeing different results on EMC operational machine. I am doing debug work and will provide some clue soon.

@jiandewang
Copy link
Collaborator

@alperaltuntas and @gustavo-marques
I have a pair of run log, restart file, et. al at
https://www.emc.ncep.noaa.gov/gc_wmb/wd20xw/JD/NCAR-PR.tar-txt
you need to remove the "-txt" from the file ename after downloading and then untar it. Our system doesn't allow file name ended as "tar". "*err" are run log. "BL" means current baseline, and you know what "PR" means in the file name
what I see is difference starts from
380c381
< h-point: c= 3602810 Before steps forces%ustar

h-point: c= 3602765 Before steps forces%ustar

@kshedstrom
Copy link
Collaborator

Switching to ifort 2023.1 fixed this for me (and broke something else).

@jiandewang
Copy link
Collaborator

@gustavo-marques and @alperaltuntas some updating from UFS:
I tried bisect and made some testing, what I can confirm is the last commit that works for UFS is
commit d4aa108
Author: Ian Grooms [email protected]
Date: Tue Aug 29 09:39:55 2023 -0600

Add Leith+E (#251)

* Add Leith+E

and problem starts from:
d741517
Merge: 92756fe d4aa108
Author: Gustavo Marques [email protected]
Date: Wed Aug 30 14:45:49 2023 -0600

Merge branch 'dev/ncar' into fpmix_draft_15August2023

our regression system only allows each user to run one regression test at the same time, thus it is a slow process for me to provide this clue to you.

@marshallward
Copy link
Collaborator

This PR passed our regression, but I suspect we will need a bit of time to manually review the changes.

I am also a bit concerned about the issues raised by @kshedstrom, although I'm not yet sure which are related to this PR.

@gustavo-marques
Copy link
Collaborator Author

@jiandewang, thanks for doing these tests and narrowing down where the change is coming from. I will look into this tomorrow and get back to you.

@jiandewang
Copy link
Collaborator

@jiandewang, thanks for doing these tests and narrowing down where the change is coming from. I will look into this tomorrow and get back to you.

@gustavo-marques one more clue for you: UFS quarter degree runs are fine on wcoss2. The half and 1x1 degree runs all failed to retain baseline results. We follow closely to the MOM_input in MOM_examples for all these 3 resolutions, so the issue must be some parameter related but I can't figure out which one it is.

@gustavo-marques
Copy link
Collaborator Author

@marshallward; regression tests are failing because KhTr_u, KhTr_v, and KhTr_h were changed from 2D (lat/lon) to 3D (lat/lon/depth).

@gustavo-marques
Copy link
Collaborator Author

@jiandewang; we are still unable to find where the issue is coming from.
Can you please send us the MOM_parameters_doc.short for the passing (1/4 deg) and failing (1/2 deg)?

@jiandewang
Copy link
Collaborator

@jiandewang; we are still unable to find where the issue is coming from. Can you please send us the MOM_parameters_doc.short for the passing (1/4 deg) and failing (1/2 deg)?

see https://www.emc.ncep.noaa.gov/gc_wmb/wd20xw/JD/MOM_parameter_doc.short.tar-txt, remove "txt" before untar it

@marshallward
Copy link
Collaborator

marshallward commented Oct 7, 2023

I have tried to look at this, and the only thing that seems to make sense is a regression in forces%ustar. Problems don't appear to become serious until the ePBL solver, but the differing term, Kd_ePBL, depends on ustar.

The only idea I have at the moment is that the inclusion of forces%omega_w2x is causing some kind of pointer corruption, or is shuffling things around in memory in a way that has discovered a hidden bug. But that feels like a "blame the compiler" excuse, not a true explanation.

My best suggestion for now is to completely remove all of the content related to omega_w2x and see if it helps to fix the problem. If so, then I recommend we remove all of this work and address it in a separate PR.

@gustavo-marques
Copy link
Collaborator Author

@marshallward: thanks for the feedback.
@jiandewang: I commented out all omega_w2x entries in my last commit. Can you please run your tests again and let us know if the issue is fixed?

@jiandewang
Copy link
Collaborator

@gustavo-marques: with your latest commit, all UFS jobs retain current baseline now.

@marshallward
Copy link
Collaborator

marshallward commented Oct 20, 2023

@gustavo-marques Did you want to clean up these changes? Or did you want to leave them in so that we can revisit this problem?

This is now the second mysterious computational issue regression detected by WCOSS, we may need to start doing some work to figure out what could be happening here.

Copy link
Collaborator

@sanAkel sanAkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers preserved for GEOS.

@gustavo-marques
Copy link
Collaborator Author

@marshallward, our computers are down this week. I suggest we take this PR as is and we (NCAR) will revisit this issue in a future PR.

Copy link
Collaborator

@marshallward marshallward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes our regression testing.

@marshallward marshallward merged commit f6b6b0b into mom-ocean:main Oct 25, 2023
8 of 10 checks passed
@marshallward
Copy link
Collaborator

@gustavo-marques Hope you don't mind, I went ahead and merged this for you.

Thanks to everyone, this one was particularly tricky.

@gustavo-marques
Copy link
Collaborator Author

@marshallward Thanks for merging it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants