Replies: 6 comments 6 replies
-
Following up on Cenlin's question, I am new to ufs-weather-model, but I spent some time this weekend trying to help him -- looking around to start to learn the configuration and workflow. The only place I was able to find node-related machine configuration was in the CICE module: Is there another place in the workflow where there is machine-specific config? Alternatively, are there any hints for where the layout is specified? I know for SRW app, the layout is part of the grid specification. Would we need to change something like that (create a custom grid for derecho), in order to try to get closer to 128 nodes per core when running on Derecho? Just throwing out a couple ideas here to get the discussion going... Thank you! |
Beta Was this translation helpful? Give feedback.
-
Perhaps @jkbk2004 or @natalie-perlin can weigh in here. I know that at the time Derecho was added to the supported plaforms, there was this issue created #2033 but I don't know the status. |
Beta Was this translation helpful? Give feedback.
-
@cenlinhe #2033 is slightly different angle. It's about runtime -depth THRD option on Derecho. BTW, layout is set on input.nml for FV3 domain decomposition. Component PE resources are specified on ufs.configure as well. |
Beta Was this translation helpful? Give feedback.
-
@cenlinhe FYI: https://ufs-weather-model.readthedocs.io/en/latest/FAQ.html#fv3atm |
Beta Was this translation helpful? Give feedback.
-
If I understand @cenlinhe initial question, this has nothing to do w/ FV3 layouts or how we specify FV3 atm resources. It has to do w/ efficient use of core-hours on Derecho. Take the same case (cpld_control_p8) on Gaea. Gaea also has an available 128 tasks/node. We use the same ufs.configure (requiring a total of 200 tasks). We also specify that ESMF is managing the threading (
and
Checking the PET logs for PET201 and above shows that nothing is happening on those tasks (you'll see the config reading and then finalizing is all). We request 2 "full" nodes (256) and use 200 of them (to allow for the ESMF-managed threading). I'm not sure, but I believe in this case we're "charged" for the use of both nodes. In the case of Derecho, if I understand, we're requesting 3 nodes but only using 96 on each. The concern being that this results in core-hours being wasted. |
Beta Was this translation helpful? Give feedback.
-
@zach1221 @FernandoAndrade-NOAA we need an experiment to reset TPN on derecho: https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/tests/cpld_control_p8#L86 |
Beta Was this translation helpful? Give feedback.
-
I am testing the default UFS global coupled run (cpld_control_p8) on NCAR Derecho HPC. By default, in the run directory, the UFS Derecho run script set the following:
"#PBS -l select=3:ncpus=96:mpiprocs=96:ompthreads=1"
and
"mpiexec -n 200 -ppn 67 --hostfile $PBS_NODEFILE ./fv3.exe"
Note that on Derecho, each node has 128 cpus and even if users do not request all 128 cpus for each node, the core hours charged will still count 128 cpus for each node. In this case, it will be "3 * 128" instead of "3 * 96".
It seems that
So my questions are:
Any help and advice would be really appreciated!
Beta Was this translation helpful? Give feedback.
All reactions