Skip to content

Conversation

dpsarmie
Copy link
Collaborator

@dpsarmie dpsarmie commented May 22, 2025

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the control_c768 test. The test was not working even with the GFSv16 configurations (export_fv3_v16 option). The updates will allow the test to be run with the GFSv17_p8 configurations and build. There are also new initial condition files that will need to be added by EPIC to the RDHPCS machines.

Commit Message:

* UFSWM - Update control_c768 test for GFSv17_p8

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Documentation:

  • No documentation update is required for this PR. The PR only fixes a regression test that was already present.

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
    The control_c768 test is not part of the regular testing suite. It should not be added to rt.conf and should not appear in test_changes.list. This test is only for rt_weekly.sh.

    Baselines for my tests were generated here:
    /scratch1/NCEPDEV/stmp4/Daniel.Sarmiento/FV3_RT/REGRESSION_TEST/control_c768_intel/

Input data Changes:

  • New input data.
    New data should be added to the RDHPCS machines. The new ICs are located on Hera in the following directory:
    /scratch1/NCEPDEV/stmp2/Daniel.Sarmiento/NEMSfv3gfs/input-data-20250507/FV3_fix_tiled/C768mx025

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@dpsarmie dpsarmie self-assigned this May 22, 2025
@dpsarmie dpsarmie added Input Data Changes This PR requires changes to input data and to be sync'd across platforms. New Baselines New baselines will be added to project. labels May 22, 2025
@ulmononian
Copy link
Collaborator

@dpsarmie this is great -- thanks for adding this. does this use v2 surface data, by chance?

@DeniseWorthen
Copy link
Collaborator

@dpsarmie Did your input data get scrubbed? I don't see it in the listed location.

@dpsarmie
Copy link
Collaborator Author

@dpsarmie Did your input data get scrubbed? I don't see it in the listed location.

Looks like it. I'll get it back up and put it in a non-stmp directory. Thanks for the heads up.

@dpsarmie this is great -- thanks for adding this. does this use v2 surface data, by chance?

@ulmononian Sorry, missed your message but no it does not. I can modify it if the v2 data are more useful.

@dpsarmie
Copy link
Collaborator Author

@dpsarmie Did your input data get scrubbed? I don't see it in the listed location.

Data are back up on Hera at the same location. I'll keep an eye on it to keep it from getting scrubbed.

@dpsarmie dpsarmie marked this pull request as ready for review June 23, 2025 15:18
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Jul 2, 2025

@dpsarmie rsynced new input files on hera and derecho: /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20250507/FV3_fix_tiled/C768mx025 and /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/input-data-20250507/FV3_fix_tiled/C768mx025

@dpsarmie
Copy link
Collaborator Author

dpsarmie commented Jul 2, 2025

@dpsarmie rsynced new input files on hera and derecho: /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20250507/FV3_fix_tiled/C768mx025 and /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/input-data-20250507/FV3_fix_tiled/C768mx025

Thanks @jkbk2004 , I'll go ahead and test it out on Hera and see if @edougherty32 can get it going on Derecho.

@dpsarmie
Copy link
Collaborator Author

dpsarmie commented Jul 2, 2025

I was able to generate new baselines on Hera.

export WW3_DOMAIN=global_270k
export MESH_WAV=mesh.${WW3_DOMAIN}.nc
export MESH_WAV="mesh.uglo_15km.nc"
export WW3_MODDEF=mod_def.exp.${WW3_DOMAIN}
Copy link
Collaborator

@DeniseWorthen DeniseWorthen Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the normal RT scripts are used to set this test up, it will copy from the WW3 input data directory a 'mod_def' file with the wrong domain. It will use mod_def.exp.global_270k. The scripts will work, but the model will be getting the wrong input file.

The mod_def for the uglo_15km grid doesn't exist in the RT input; it will will need to be added and then either the mod_def name hard-coded or change WW3_domain=uglo_15km and export WW3_MODDEF=mod_def.${WW3_DOMAIN}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yea, I've been trying to replicate what is needed but in the RT structure using symbolic links (/scratch4/NCEPDEV/nems/Daniel.Sarmiento/NEMSfv3gfs/input-data-20250507). It should make it easier to copy over files where they are needed once this is ready to get merged.

But, yes, this is where I'm at right now.

If there's an easier way, I'm willing to do that but feel free to take a look. I think I got most of the fix files situated, just need to get the ICs linked correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can link in the input for the mod_def from the g-w --- but it does have a chance of diverging in the future. I can definitely help with the wave part of this as needed. Just let me know @dpsarmie

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you copy the right mod_def (not sure where it is, I don't see any mod_def files in the glopara/fix/wave), I think it's safest just to hard code WW3_MODDEF like you did w/ the MESH_WAV.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a fix file in glopara/fix/wave - it's created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. Thanks. So for this RT case, we'll just need a copy I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaMeixner-NOAA thanks! I think just linking from g-w will work for now and we can address potential divergences (bring in EPIC to talk about data upkeep) when this PR is ready to get merged.
All I would need is a link and I'll add it to my data dir.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpsarmie I created a new WW3 input directory that includes the uglo_15km it's on Ursa here:

/scratch3/NCEPDEV/climate/Jessica.Meixner/PR_WW3/WW3_input_data_20250904

Updates include:
Updating createmoddefs/creategridfiles.sh for uglo_15km

Linking three input files from workflow fix (These might should be copied for staging purposes, but have linked these for now):
mesh.uglo_15km.nc -> /scratch3/NCEPDEV/global/role.glopara/fix/wave/20250508/mesh.uglo_15km.nc
createmoddefs/uglo_15km.msh -> /scratch3/NCEPDEV/global/role.glopara/fix/wave/20250508/uglo_15km.msh
createmoddefs/ww3_grid.inp.uglo_15km -> /scratch3/NCEPDEV/global/role.glopara/fix/wave/20250508/ww3_grid.inp.uglo_15km

Then a new file mod_def.uglo_15km is created which would need to be used in your high resolution gfsv17 look-a-like tests.

@dpsarmie dpsarmie changed the title Update control_c768 configs Update control_c768 configs and add cpld_control_c1152 Sep 3, 2025
gspetro-NOAA added a commit to gspetro-NOAA/ufs-weather-model that referenced this pull request Sep 8, 2025
@dpsarmie dpsarmie marked this pull request as draft September 15, 2025 18:27
@gspetro-NOAA gspetro-NOAA moved this to Review/Schedule in PRs to Process Sep 18, 2025
@gspetro-NOAA gspetro-NOAA removed the status in PRs to Process Sep 18, 2025
@gspetro-NOAA gspetro-NOAA moved this to Evaluating in PRs to Process Sep 18, 2025
export MESH_ICE="mesh.mx025.nc"
export eps_imesh=1.0e-1
export MOM6_CHLCLIM=seawifs-clim-1997-2010.${NX_GLB}x${NY_GLB}.v20180328.nc

Copy link
Collaborator

@DeniseWorthen DeniseWorthen Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpsarmie I looked at a run directory on Ursa here (/scratch4/NCEPDEV/nems/Daniel.Sarmiento/WORKING_c1152)

That PR produces a CICE running only on 9 tasks.

ICE_petlist_bounds:             9716 9725

And OCN is running only on 20 tasks.

OCN_petlist_bounds:             9696 9715

@gspetro-NOAA
Copy link
Collaborator

@dpsarmie When this PR is ready, could you also take care of Issue #2909? Seems like a good opportunity to rm unneeded conf files while updating the weekly one.
Also, I believe this PR should resolve #1933, as well. Should we just close #1933 as basically a duplicate of #2748?

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Oct 6, 2025

@dpsarmie I will be making at PR to your UWM fork w/ a working version of the C1152 test case. I've checked the various config files between the WCOSS2 G-W directory you pointed me to for the tracing work and they're all consistent.

Can you update your UWM feature branch?

Do you know which platforms this weekly test might be running on? I haven't scaled up the task counts completely (eg wav is on 1200) or given the write-grid tasks as many as in the G-W. I'm assuming speed isn't the issue here.

I did all my testing while compiling in debug mode, so we do need a fix for the issue Dusan found for the MOM6 incupd (#2906).

@dpsarmie
Copy link
Collaborator Author

dpsarmie commented Oct 6, 2025

@dpsarmie I will be making at PR to your UWM fork w/ a working version of the C1152 test case. I've checked the various config files between the WCOSS2 G-W directory you pointed me to for the tracing work and they're all consistent.

Can you update your UWM feature branch?

Do you know which platforms this weekly test might be running on? I haven't scaled up the task counts completely (eg wav is on 1200) or given the write-grid tasks as many as in the G-W. I'm assuming speed isn't the issue here.

I did all my testing while compiling in debug mode, so we do need a fix for the issue Dusan say for the MOM6 incupd (#2906).

Thanks Denise for the work on this. I don't know which platform this is going to be running on. I was assuming Ursa but @jkbk2004 or @gspetro-NOAA might have an answer since EPIC will be running the weekly tests once they're ready to do so.

@DeniseWorthen
Copy link
Collaborator

Looking at the other (now closed) issue, this appears to be planned for Derecho and Ursa. Are both C1152 and C768 planned for both platforms? There are few if any UWM developers who have access to Derecho, but having the C1152 test case available there might be of use for collaboration w/ the ESMF team.

@dpsarmie
Copy link
Collaborator Author

dpsarmie commented Oct 6, 2025

That would make sense. The initial motivation for this (the C768 case) was a request from a group at NCAR so Derecho support should be a priority.

@gspetro-NOAA
Copy link
Collaborator

Yes, the thought (on the EPIC side) was Ursa and Derecho. We can help test on Derecho once testing resumes there.

@DeniseWorthen
Copy link
Collaborator

@JessicaMeixner-NOAA Remind me how the pre-generated points file works? Do I want something in the run directory called out.pnt_wght.ww3.nc or do I want pnt_wght.ww3.nc?

@JessicaMeixner-NOAA
Copy link
Collaborator

@JessicaMeixner-NOAA Remind me how the pre-generated points file works? Do I want something in the run directory called out.pnt_wght.ww3.nc or do I want pnt_wght.ww3.nc?

You want pnt_wght.ww3.nc. out.pnt_wght.ww3.nc is the file that gets created if you do not have that file, and then you can rename that to pnt_wght.ww3.nc.

For the point list we use for GFS, the saved fix file is here:
/scratch3/NCEPDEV/global/role.glopara/fix/wave/20250508/pnt_wght.uglo_15km.nc

This is the full buoy list: https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/wave/wave_gfs.buoys.full

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Oct 6, 2025

OK, thanks. I had the full point list from Dan's WCOSS2 run directory; that is also my source for the mod_def and wave mesh.

The current input directory that the test will use is /scratch3/NCEPDEV/stmp/Denise.Worthen/GFSv17opn. I'm planning on having this sit outside of input-data in the RTs, like the old BM_IC-20220207 directory did.

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @DeniseWorthen - We might consider moving the mod_def.uglo_15km to the WW3_input. I have a WW3 Input directory here: /scratch3/NCEPDEV/climate/Jessica.Meixner/PR_WW3/WW3_input_data_20250904

This way the mod_def gets updated if the WW3 code gets updated.

@DeniseWorthen
Copy link
Collaborator

OK, that makes sense, but I can't make the required changes in the cpld_control_run.IN until that directory is present in the standard input-data locations.

* pull points list and nml from gfsv17 input directory
* remove ww3 output directories because they're not avail in RTs
* make run directory a little more sandbox-able w/rt ice and
cmeps startup filenames and pointers
@DeniseWorthen
Copy link
Collaborator

@dpsarmie I've opened a PR to your branch; this runs on Ursa in a reasonable time for 6 hours, even compiling in debug mode. But I haven't turned on post or played w/ the write grid resources.

@DeniseWorthen
Copy link
Collaborator

Just FYI, I also coped the GFSv17opn IC/fix directory to C6 /gpfs/f6/infra-cpu/world-shared/Denise.Worthen/GFSv17opn

@gspetro-NOAA gspetro-NOAA moved this from Evaluating to Draft in PRs to Process Oct 7, 2025
@DeniseWorthen
Copy link
Collaborator

@DusanJovic-NOAA I'm stuck at trying to figure out how to create a job card for the RT test that matches the G-W as much as possible. For example, these are not in our wcoss2 job card template

export FI_OFI_RXM_RX_SIZE=40000
export FI_OFI_RXM_TX_SIZE=40000
export FI_OFI_RXM_SAR_LIMIT=3145728
export OMP_PLACES=cores
export OMP_STACKSIZE=2048M
# export MPICH_MPIIO_HINTS="*:romio_cb_write=disable"
export MPICH_MPIIO_HINTS="*:romio_cb_write=enable"

Do you think I need to create a special 'gfsv17opn' job card template for wcoss2?

@DeniseWorthen
Copy link
Collaborator

@dpsarmie Comparing your "final" gfsv17 configuration files, I noticed that you have two instances of hour=120 in your output_fh list. Probably doesn't matter, but I assume somewhere in G-W there is an error when generating or filling the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Input Data Changes This PR requires changes to input data and to be sync'd across platforms. New Baselines New baselines will be added to project.

Projects

Status: Draft

Development

Successfully merging this pull request may close these issues.

Loosen the restriction in RTs that Mediator runs on <300 Tasks Update C768 case for weekly RTs

6 participants