-
Notifications
You must be signed in to change notification settings - Fork 456
Description
On internal NERSC machine (alvarez), there are newer module versions that I can test now before they might be
installed on perlmutter. I'm testing with GNU v14 (and v14.2) and have only encountered one issue so far.
There was a runtime error in controlMod.F90 that I debugged for a bit. I don't see anything obviously wrong on our side.
So it's possible that GNU fortran may be getting confused with symbols -- however, after several hours sessions with gemini, I was unable to determine the issue or create a reproducer -- anything tried builds/runs fine.
In components/elm/src/main/controlMod.F90, I see there are several use module, only:, except there was one that
seemed odd. At top of use module section, we see a blanket use elm_varctl. When I went thru and found all variables needed to replace this with a use elm_varctl, only:, I no longer see the build error. I'm referring to this as a work-around as I'm not sure if it's a problem with compiler or not, but I would think this change is an improvement in terms of SW engineering as it now checks that only those vars are used. With this change, verified e3sm_developer tests passed with GNU 14 (and some other module versions upgraded).
branch: ndk/elm/use-only-variable-cleanup
The original error:
0: ERROR: Unknown error submitted to shr_abort_abort.
0: #0 0x15507ba6f862 in ???
0: #1 0xdb979a in __shr_abort_mod_MOD_shr_abort_backtrace
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/share/util/shr_abort_mod.F90:104
0: #2 0xdb9969 in __shr_abort_mod_MOD_shr_abort_abort
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/share/util/shr_abort_mod.F90:61
0: #3 0x53ec9d in __controlmod_MOD_control_init
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/components/elm/src/main/controlMod.F90:401
0: #4 0x54ade2 in __elm_initializemod_MOD_initialize1
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/components/elm/src/main/elm_initializeMod.F90:144
0: #5 0x52f525 in __lnd_comp_mct_MOD_lnd_init_mct
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/components/elm/src/cpl/lnd_comp_mct.F90:300
0: #6 0x4857ed in __component_mod_MOD_component_init_cc
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/driver-mct/main/component_mod.F90:258
0: #7 0x474702 in __cime_comp_mod_MOD_cime_init
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/driver-mct/main/cime_comp_mod.F90:1528
0: #8 0x45f1df in cime_driver
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/driver-mct/main/cime_driver.F90:122
0: #9 0x45f1df in main
0: at /global/cfs/cdirs/e3sm/ndk/repos/ndkmf-alv-update2026/driver-mct/main/cime_driver.F90:23
if (masterproc) then
! ----------------------------------------------------------------------
! Read namelist from standard input.
! ----------------------------------------------------------------------
if ( len_trim(NLFilename) == 0 )then
call endrun(msg=' error: nlfilename not set'//errMsg(__FILE__, __LINE__))
end if
unitn = getavu()
write(iulog,*) 'Read in elm_inparm namelist from: ', trim(NLFilename)
open( unitn, file=trim(NLFilename), status='old' )
call shr_nl_find_group_name(unitn, 'elm_inparm', status=ierr)
if (ierr == 0) then
read(unitn, elm_inparm, iostat=ierr)
if (ierr /= 0) then
call endrun(msg='ERROR reading elm_inparm namelist'//errMsg(__FILE__, __LINE__)) !<-- line 401
end if
end if
It's during the processing of the &elm_inparm section of lnd_in. Example of var it has trouble with is nu_con.
When I made a local nu_con_local, read in, copied back to nu_con, that worked, but another var caused similar
issue.