Skip to content

Conversation

JiliDong-NOAA
Copy link
Contributor

@JiliDong-NOAA JiliDong-NOAA commented Oct 9, 2025

Commit Queue Requirements:

  • This PR addresses a relevant WM issue (if not, create an issue).
  • All subcomponent pull requests (if any) have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines), preferably on Ursa (Derecho or Hercules are acceptable alternatives). Exceptions: documentation-only PRs, CI-only PRs, etc.
    • Commit log file w/full results from RT suite run (if applicable).
    • Verify that test_changes.list indicates which tests, if any, are changed by this PR. Commit test_changes.list, even if it is empty.
  • Fill out all sections of this template.

Description:

This PR is from @DusanJovic-NOAA and @JiliDong-NOAA and it fixes RRFS/REFS restart bitwise reproducibility issues caused by:

  1. RRFS smoke/dust components
  2. HAILCAST variables updraft duration and mask not written out and read in the restart runs
  3. snow equivalent water accumulation not written out to the restart file
  4. saSAS convection initialization logic (i.e. qadv) needs to be corrected
  5. Grell-Freitas convection initialization logic needs to be corrected (i.e. cold starting T/q tendency only applied in the first timestep)
  6. REFS ensemble restart reproducibility issues when running with 32 bit physics (SPP related variable names mismatch and data type precision inconsistency)

It also fixes crash when running REFS under DEBUG mode
The issues are related to LSM-SPP. It appears that LSM-SPP perturbations were added to the whole domain without masking out the water/ice points. This caused:

  1. 0 index error under DEBUG mode for smcmin/smcmax(stype) when stype=0 over water
  2. floating overflow error under DEBUG mode when applying LSM-SPP to zorll where zorll would have missing values over water/ice (9x10e30)
    The forecast will only change when Grell-Freitas is turned on during warm start runs with gf_coldstart being explicitly set to T in the namelist

This PR also includes a hook to output surface specific humidity, which may be needed for RRFS post-processing.

The PR address issue #2926

Commit Message:

* UFSWM - [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues
  * AQM - 
  * CDEPS - 
  * CICE - 
  * CMEPS - 
  * CMakeModules - 
  * UFSATM -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility
    * ccpp-physics -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility
    * atmos_cubed_sphere -  [production/RRFS.v1] fix HAILCAST restart reproducibility
  * GOCART - 
  * HYCOM - 
  * MOM6 - 
  * NOAHMP - 
  * WW3 - 
  * fire_behavior
  * stochastic_physics -  [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash

Priority:

  • Critical Bugfix: Reason - This PR is for RRFS v1 implementation. The code delivery data is set to be Oct. 31
  • High: Reason
  • Normal

Git Tracking

UFSWM:

  • Closes #

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Documentation:

  • Documentation update required.
    • Relevant updates are included with this PR.
    • A WM issue has been opened to track the need for a documentation update; a person responsible for submitting the update has been assigned to the issue (link issue).
  • Documentation update NOT required.
    • Explanation:

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@github-project-automation github-project-automation bot moved this to Evaluating in PRs to Process Oct 9, 2025
@JiliDong-NOAA JiliDong-NOAA changed the title [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues [production/RRFS.v1] fix RRFS/REFS restart reproducibility and DEBUG crash issues for RRFSv1 operational implementation Oct 9, 2025
@gspetro-NOAA
Copy link
Collaborator

@jkbk2004 @BrianCurtis-NOAA Do either of you need more information before sanity testing this production branch PR?

@gspetro-NOAA gspetro-NOAA added the No Baseline Change No Baseline Change label Oct 14, 2025
@gspetro-NOAA gspetro-NOAA moved this from Evaluating to Review in PRs to Process Oct 15, 2025
@BrianCurtis-NOAA
Copy link
Collaborator

@MatthewPyle-NOAA Is there any specific testing this branch uses? I can't recall if you run the full suite on WCOSS2 and/or any other system, or rely on another testing system.

@MatthewPyle-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA We typically have run the rt.conf_rrfs tests for this branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the filename is a message to remove this?

@BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA We typically have run the rt.conf_rrfs tests for this branch.

I don't see any evidence that these were run. @JiliDong-NOAA were tests run on any machine, yet? Is there a machine not named WCOSS2 that you prefer rt.conf_rrfs is run on?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Oct 15, 2025

@MatthewPyle-NOAA @BrianCurtis-NOAA This production branch is really getting diverged from develop branch: spack stack and new machine, etc. Note that this branch is based on spack stack 1.5 and there is decommission plan for hera: around Feb or Spring time. We are already using Ursa. Quite some work to sync between develop and this branch. If possible, optimal option might be recreating a production branch.

@BrianCurtis-NOAA
Copy link
Collaborator

@MatthewPyle-NOAA @BrianCurtis-NOAA This production branch is really getting diverged from develop branch: spack stack and new machine, etc. Note that this branch is based on spack stack 1.5 and there is decommission plan for hera: around Feb or Spring time. We are already using Ursa. Quite some work to sync between develop and this branch. If possible, optimal option might be recreating a production branch.

IMO, as this is a production only branch, that only WCOSS2 needs to keep supporting this. I'll leave it to RRFS/REFS CM's to let us know if they need support on other machines. As of right now, the libraries on WCOSS2 are staying put and should be good for this testing.

@JiliDong-NOAA
Copy link
Contributor Author

JiliDong-NOAA commented Oct 15, 2025

I will run the rrfs regression test on WCOSS2. @jkbk2004 @BrianCurtis-NOAA Is there an instruction on how to run RTs with ecflow on WCOSS2?

@jkbk2004
Copy link
Collaborator

build failure on hercules for compile_atm_debug_dyn32_intel
'''/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2016): error #6633: The
type of the actual argument differs from the type of the dummy argument. [DEL]
flag_mid,del,tmf,qmicro,dbyo1,zdqca,omega_u,zeta,xlv,dtime, &
-----------------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2017): error #6633: The
type of the actual argument differs from the type of the dummy argument. [KBCON]
forceqv_spechum,kbcon,ktop,cnvflg,betascu,betamcu,betadcu, &
------------------------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2017): error #6633: The
type of the actual argument differs from the type of the dummy argument. [CNVFLG]
forceqv_spechum,kbcon,ktop,cnvflg,betascu,betamcu,betadcu, &
-----------------------------------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2017): error #6633: The
type of the actual argument differs from the type of the dummy argument. [BETASCU]
forceqv_spechum,kbcon,ktop,cnvflg,betascu,betamcu,betadcu, &
------------------------------------------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2015): error #6631: A n
on-optional actual argument must be present when invoking a procedure with an explicit interface. [SIGMAB]
call progsigma_calc(itf,ktf,flag_init,flag_restart,flag_shallow, &
--------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2016): error #6634: The
shape matching rules of actual arguments and dummy arguments have been violated. [DEL]
flag_mid,del,tmf,qmicro,dbyo1,zdqca,omega_u,zeta,xlv,dtime, &
-----------------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2015): error #8284: If
the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is
an element of an array that is not assumed shape, pointer, or polymorphic. [ZETA]
call progsigma_calc(itf,ktf,flag_init,flag_restart,flag_shallow, &
--------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2017): error #6634: The
shape matching rules of actual arguments and dummy arguments have been violated. [FORCEQV_SPECHUM]
forceqv_spechum,kbcon,ktop,cnvflg,betascu,betamcu,betadcu, &
--------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2015): error #8284: If
the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is
an element of an array that is not assumed shape, pointer, or polymorphic. [CNVFLG]
call progsigma_calc(itf,ktf,flag_init,flag_restart,flag_shallow, &
--------------^
/work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90(2018): error #6634: The
shape matching rules of actual arguments and dummy arguments have been violated. [SIGMAIN]
sigmind,sigminm,sigmins,sigmain,sigmaout,sigmab)
--------------------------------------^
compilation aborted for /work/noaa/epic/jongkim/UFS-RT/hercules/rt-2925/FV3/ccpp/physics/physics/CONV/C3/cu_c3_deep.F90
(code 1)
make[2]: *** [FV3/ccpp/physics/CMakeFiles/ccpp_physics.dir/build.make:218: FV3/ccpp/physics/CMakeFiles/ccpp_physics.dir
/physics/CONV/C3/cu_c3_deep.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:464: FV3/ccpp/physics/CMakeFiles/ccpp_physics.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:136: all] Error 2

'''

@grantfirl
Copy link
Collaborator

@jkbk2004 I've asked @JiliDong-NOAA to fix the compilation issue. Should be an easy fix. There is another scheme (C3 convection) that uses the progsigma_calc subroutine where the interface was changed, so they need to fix the call to that subroutine there too.

@BrianCurtis-NOAA
Copy link
Collaborator

I will run the rrfs regression test on WCOSS2. @jkbk2004 @BrianCurtis-NOAA Is there an instruction on how to run RTs with ecflow on WCOSS2?

./rt.sh -e -a <account> -l rt.conf_rrfs > rt.out 2>&1 & (account for me would be GFS-DEV, for example)

@JiliDong-NOAA
Copy link
Contributor Author

@jkbk2004 I've asked @JiliDong-NOAA to fix the compilation issue. Should be an easy fix. There is another scheme (C3 convection) that uses the progsigma_calc subroutine where the interface was changed, so they need to fix the call to that subroutine there too.

Thanks @grantfirl ! Good catch. I will have it fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No Baseline Change No Baseline Change

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

7 participants