Skip to content

Add a low resolution test to mimic GFSv17 cycling as much as possible #3617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

JessicaMeixner-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA JessicaMeixner-NOAA commented Apr 25, 2025

Description

This PR has one minor bug fix for the stage IC job when and adds a low resolution test that has all components for cycling that is anticipated to be used in GFSv17.

Resolves #3441

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.) New CI test

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

This test was run on hera:
RUNTESTS=/scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/testlowres03
(As of posting this PR a few last jobs remained, but 1.5 cycles have completed successfully).
Update: 2.5 cycles succeeded.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work.

Copy link
Contributor

@CatherineThomas-NOAA CatherineThomas-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add an account line in the main yaml, like in PR #3411. Otherwise, my test succeeded on WCOSS2 and all the tasks looks correct.

@JessicaMeixner-NOAA
Copy link
Contributor Author

I think we need to add an account line in the main yaml, like in PR #3411. Otherwise, my test succeeded on WCOSS2 and all the tasks looks correct.

@CatherineThomas-NOAA - Thanks for catching this. I've added the account.

Copy link
Contributor

@CatherineThomas-NOAA CatherineThomas-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JessicaMeixner-NOAA! The account update works in my test. Approve.

@AndrewEichmann-NOAA
Copy link
Contributor

@CatherineThomas-NOAA Would this experiment, with appropriate changes to the ensemble, be a good basis for working on the reduced ensemble members?

@CatherineThomas-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA - Yes, with changes to nens, NMEM_ENS_GFS, NMEM_ENS_GFS_OFFSET, and then pointing to other ICs, I think it could work well for your case.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@KateFriedman-NOAA KateFriedman-NOAA self-requested a review April 28, 2025 18:21
Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @JessicaMeixner-NOAA !

@KateFriedman-NOAA
Copy link
Member

Took a look at the test output provided by @JessicaMeixner-NOAA on Hera and noticed that the final cycle metp jobs (18z offset from end) aren't running because their dependencies are set for a non-existent cycle:

[Kate.Friedman@hfe07 testlowres03]$ rocotostat -d testlowres03.db -w testlowres03.xml -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Apr 25 2025 13:38:19    Apr 25 2025 13:55:16
202112201800        Done    Apr 25 2025 13:38:19    Apr 25 2025 16:25:22
202112210000        Done    Apr 25 2025 13:38:19    Apr 25 2025 17:55:15
202112211800      Active    Apr 25 2025 14:00:36             -          
[Kate.Friedman@hfe07 testlowres03]$ rocotostat -d testlowres03.db -w testlowres03.xml -c 202112211800                                                     
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202112211800            gfs_metpg2g1                           -                   -                   -         -             -
202112211800            gfs_metpg2o1                           -                   -                   -         -             -
202112211800            gfs_metppcp1                           -                   -                   -         -             -
[Kate.Friedman@hfe07 testlowres03]$ rocotocheck -d testlowres03.db -w testlowres03.xml -c 202112211800 -t gfs_metpg2g1                                                    

Task: gfs_metpg2g1
  account: marine-cpu
  command: /scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/global-workflow/dev/jobs/metp.sh
  cores: 1
  cycledefs: metp,last_gfs
  final: false
  jobname: testlowres03_gfs_metpg2g1_18
  join: /scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/testlowres03/COMROOT/testlowres03/logs/2021122118/gfs_metpg2g1.log
  maxtries: 2
  memory: 80G
  metatasks: gfs_metp
  name: gfs_metpg2g1
  nodes: 1:ppn=1:tpp=1
  partition: hera
  queue: batch
  seqnum: 1
  throttle: 9999999
  walltime: 06:00:00
  environment
    CDATE ==> 2021122118
    COMROOT ==> /scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/testlowres03/COMROOT
    DATAROOT ==> /scratch1/NCEPDEV/stmp2/Jessica.Meixner/RUNDIRS/testlowres03/gfs.2021122118
    EDATE_GFS ==> 2021122100
    EXPDIR ==> /scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/testlowres03/EXPDIR/testlowres03
    HOMEgfs ==> /scratch1/NCEPDEV/climate/Jessica.Meixner/addlowrestest/global-workflow
    METPCASE ==> g2g1
    NET ==> gfs
    PDY ==> 20211221
    RUN ==> gfs
    RUN_ENVIR ==> emc
    SDATE_GFS ==> 2021122018
    cyc ==> 18
  dependencies
    OR is not satisfied
      gfs_arch_vrfy of cycle 202112211800 is not SUCCEEDED
      AND is not satisfied
        NOT is satisfied
          gfs_arch_vrfy is not valid
        gfs_arch_vrfy of cycle 202112211200 is not SUCCEEDED

Cycle: 202112211800
  Valid for this task: YES
  State: active
  Activated: 2025-04-25 14:00:36 UTC
  Completed: -
  Expired: -

Job: This task has not been submitted for this cycle

Task can not be submitted because:
  Dependencies are not satisfied
[Kate.Friedman@hfe07 testlowres03]$ rocotostat -d testlowres03.db -w testlowres03.xml -s | grep 202112211200
[Kate.Friedman@hfe07 testlowres03]$ 

Will fire off CI on WCOSS2, look for the same behavior, and see what fix is needed.

@KateFriedman-NOAA KateFriedman-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Apr 28, 2025
@JessicaMeixner-NOAA
Copy link
Contributor Author

@KateFriedman-NOAA - The C96C48_hybatmaerosnowDA.yaml test does not have the extra met+ jobs, although it also has DO_METP=YES as well. Not sure why this test would have the extra met+ jobs that are different from C96C48_hybatmaerosnowDA

@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Apr 28, 2025
@emcbot
Copy link

emcbot commented Apr 28, 2025

CI Tests set up to run in /lfs/h2/emc/ptmp/emc.global/PR/PR_3617/RUNTESTS on WCOSS

@aerorahul aerorahul added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 labels Apr 28, 2025
@emcbot emcbot added CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Gaeac6-Running CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed and removed CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 CI-Gaeac6-Running labels Apr 28, 2025
@emcbot
Copy link

emcbot commented Apr 29, 2025

Experiment C96C48mx500_S2SW_cyc_gfs FAILED on Gaeac6 in Build# 1 in
/gpfs/f6/drsa-precip3/world-shared/global/CI/3617/RUNTESTS/EXPDIR/C96C48mx500_S2SW_cyc_gfs_f75b4cc7

@emcbot emcbot added CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed and removed CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed labels Apr 29, 2025
@emcbot
Copy link

emcbot commented Apr 29, 2025

CI Failed on Gaeac6 in Build# 1
Built and ran in directory /gpfs/f6/drsa-precip3/world-shared/global/CI/3617


Experiment C48_ATM_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Mon 28 Apr 2025 06:17:11 PM EDT
Experiment C48_S2SW_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Mon 28 Apr 2025 06:17:11 PM EDT
Experiment C48mx500_hybAOWCDA_f75b4cc7 Completed 2 Cycles: *SUCCESS* at Mon 28 Apr 2025 06:41:36 PM EDT
Experiment C48_S2SWA_gefs_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Mon 28 Apr 2025 07:00:01 PM EDT
Experiment C48mx500_3DVarAOWCDA_f75b4cc7 Completed 2 Cycles: *SUCCESS* at Mon 28 Apr 2025 07:06:03 PM EDT
Experiment C96_atm3DVar_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Mon 28 Apr 2025 07:30:17 PM EDT
Experiment C96C48_hybatmDA_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Mon 28 Apr 2025 07:30:19 PM EDT
Experiment C96C48_hybatmaerosnowDA_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Mon 28 Apr 2025 07:36:31 PM EDT
Experiment C96C48mx500_S2SW_cyc_gfs_f75b4cc7 Terminated with 0 tasks failed and 0 dead at Mon 28 Apr 2025 09:02:08 PM EDT
Experiment C96C48mx500_S2SW_cyc_gfs_f75b4cc7 Terminated: *STALLED*

@KateFriedman-NOAA
Copy link
Member

Some gempak jobs hit their walltimes in the extended test on WCOSS2. I am rerunning them to confirm they ran long due to a machine issue. All other jobs and test cases completed without issue.

@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Apr 29, 2025
@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 29, 2025

Found a fix that resolves the gfs_metp dependency issue by making the offset for part of the dependency adjust based on EDATE :

WCOSS2 (BACKUPSYS-C) global-workflow> git diff dev/workflow/rocoto/gfs_tasks.py
diff --git a/dev/workflow/rocoto/gfs_tasks.py b/dev/workflow/rocoto/gfs_tasks.py
index 89ac5e6e8..3b0a99dfd 100644
--- a/dev/workflow/rocoto/gfs_tasks.py
+++ b/dev/workflow/rocoto/gfs_tasks.py
@@ -1938,7 +1938,10 @@ class GFSTasks(Tasks):
                     dep_dict = {'type': 'cycleexist', 'condition': 'not', 'offset': offset}
                     deps2.append(rocoto.add_dependency(dep_dict))
 
-                offset = timedelta_to_HMS(-to_timedelta(f'{6*lookback}H'))
+                edate_gfs = self._base['EDATE']
+                edate_metp = edate_gfs.replace(hour=18)
+                edate_metp_diff = edate_metp - edate_gfs
+                offset = timedelta_to_HMS(-to_timedelta(f'{edate_metp_diff}H'))
                 dep_dict = {'type': 'task', 'name': f'{self.run}_arch_vrfy', 'offset': offset}
                 deps2.append(rocoto.add_dependency(dep_dict))
                 deps.append(rocoto.create_dependency(dep_condition='and', dep=deps2))

I updated the xml for the C96C48mx500_S2SW_cyc_gfs CI case on WCOSS2 and that case is now finishing up correctly.

The extended CI case on WCOSS2 is still running after I booted the gempak jobs that hit their walltimes (reran fine within time) and it should finish in the next couple hours.

@JessicaMeixner-NOAA
Copy link
Contributor Author

@KateFriedman-NOAA thanks! please let me know if there's any code you want me to add to this branch, etc. And thanks for finding and fixing the bug! I just thought it was a feature ... oops!?

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Apr 29, 2025
@emcbot
Copy link

emcbot commented Apr 29, 2025

Experiment C96C48mx500_S2SW_cyc_gfs FAILED on Hera in Build# 2 in
/scratch1/NCEPDEV/global/glopara/CI/3617/RUNTESTS/EXPDIR/C96C48mx500_S2SW_cyc_gfs_f75b4cc7

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Apr 29, 2025
@emcbot
Copy link

emcbot commented Apr 29, 2025

CI Failed on Hera in Build# 2
Built and ran in directory /scratch1/NCEPDEV/global/glopara/CI/3617


Experiment C48_S2SW_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Tue Apr 29 17:06:07 UTC 2025
Experiment C48_ATM_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Tue Apr 29 17:06:30 UTC 2025
Experiment C48mx500_hybAOWCDA_f75b4cc7 Completed 2 Cycles: *SUCCESS* at Tue Apr 29 17:30:38 UTC 2025
Experiment C96mx100_S2S_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Tue Apr 29 17:43:04 UTC 2025
Experiment C48_S2SWA_gefs_f75b4cc7 Completed 1 Cycles: *SUCCESS* at Tue Apr 29 17:55:26 UTC 2025
Experiment C48mx500_3DVarAOWCDA_f75b4cc7 Completed 2 Cycles: *SUCCESS* at Tue Apr 29 18:20:07 UTC 2025
Experiment C96C48_hybatmDA_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Tue Apr 29 18:56:17 UTC 2025
Experiment C96C48_hybatmaerosnowDA_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Tue Apr 29 18:56:24 UTC 2025
Experiment C96_atm3DVar_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Tue Apr 29 19:08:37 UTC 2025
Experiment C96C48_ufs_hybatmDA_f75b4cc7 Completed 3 Cycles: *SUCCESS* at Tue Apr 29 19:21:13 UTC 2025
Experiment C96C48mx500_S2SW_cyc_gfs_f75b4cc7 Terminated with 0 tasks failed and 0 dead at Tue Apr 29 21:17:40 UTC 2025
Experiment C96C48mx500_S2SW_cyc_gfs_f75b4cc7 Terminated: *STALLED*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add low resolution ci test with all the GFSv17 planned configurations
7 participants