Model hangs after COMPLETED MOM INITIALIZATION? #2473
-
Has anyone had the case where the model just hangs after COMPLETED MOM INITIALIZATION? I've seen this happen a few times now, on my local cluster and now on Frontera. I'm not seeing any error messages, it just hangs. The closest I have to an error message is the last entry in PET000, which is:
Does this situation and log message ring a bell for anyone? |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 33 replies
-
we had 2 hanging case in HR4 (C1152) but they finished several days of forecast before hanging and we saw lots of unreasonable temperature warning in the log file. So your case is different from HR4. I assume you are running C192. I recall we had hanging issue like this when we tried HR2 on HERA (the system worked fine on wcoss2 but hanging on HERA). I will dig out our HR2 doc and get bak to you. |
Beta Was this translation helpful? Give feedback.
-
Fiddling with settings some more I finally got something a little more concrete. Any ideas as to where this error would be coming from? It seems to be sensitive to the choice if sfc_tile* files, but this field is not in those files.
|
Beta Was this translation helpful? Give feedback.
-
The slow march through various instabilities continues! Updating fd_ufs.yaml got me through the previous one, only to crash with:
@DeniseWorthen @jiandewang does this one look familiar? |
Beta Was this translation helpful? Give feedback.
-
@benjamin-cash one suggestion: |
Beta Was this translation helpful? Give feedback.
-
I just checked |
Beta Was this translation helpful? Give feedback.
-
this is what usually I do to run a sample case ./rt.sh -a nems -k -l rt.conf-test |
Beta Was this translation helpful? Give feedback.
-
@benjamin-cash Are you all set here or do you need further assistance? |
Beta Was this translation helpful? Give feedback.
-
The answer appears to be that these settings are missing from the job script:
They are present in HERCULES.env and FRONTERA.env, but for some reason were never set in the job script.
without the model crashing, but that is a concern for another day. |
Beta Was this translation helpful? Give feedback.
The answer appears to be that these settings are missing from the job script:
They are present in HERCULES.env and FRONTERA.env, but for some reason were never set in the job script.
ulimit -s unlimited
eliminated the seg fault in the ice transport code, andexport OMP_STACKSIZE=512M
eliminated the crash in the radiation, or at least that is what I am seeing in my test run. Somewhat concerningly, in my test run I am also seeing warnings likewithout the model crashing, but that is a concern for another day.