-
Notifications
You must be signed in to change notification settings - Fork 334
Description
Brief summary of bug
I have confirmed @adrifoster's TRENDY2025 restart failure with my ctsm5.3.062 test (original 20-yr simulation described in NCAR/LMWG_dev#109). I restarted my case with
STOP_N = 2 days
REST_N = 1 day
DOUT_S_SAVE_INTERIM_RESTART_FILES = FALSE
RESUBMIT = 1
and it completed all 4 days correctly. Then I restarted again with
STOP_N = 2 days
REST_N = 1 day
DOUT_S_SAVE_INTERIM_RESTART_FILES = TRUE
RESUBMIT = 1
This time it completed the first 2 days but failed when it resubmitted.
General bug information
CTSM version you are using: ctsm5.3.062 (first version splitting hX files into hXa/hXi)
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: DOUT_S_SAVE_INTERIM_RESTART_FILES = TRUE
Details of bug
Important output or errors that show the problem
My case ran here
/glade/derecho/scratch/slevis/ctsm53062_f09_1850/run
and archived here
/glade/derecho/scratch/slevis/archive/ctsm53062_f09_1850
So far I have noticed the following behavior:
- In .../archive/.../rest I type
ls 0021-01-0[5-7]*
and get (recall the model restarted successfully from 01-05, and then failed to restart from 01-07):
0021-01-05-00000:
ctsm53062_f09_1850.clm2.h0a.0020-12.nc ctsm53062_f09_1850.mosart.h0i.0020-12.nc rpointer.atm.0021-01-02-00000 rpointer.lnd.0006-01-01-00000 rpointer.lnd.0021-01-06-00000 rpointer.rof.0021-01-04-00000
ctsm53062_f09_1850.clm2.h0i.0020-12.nc ctsm53062_f09_1850.mosart.r.0021-01-05-00000.nc rpointer.atm.0021-01-03-00000 rpointer.lnd.0011-01-01-00000 rpointer.lnd.0021-01-07-00000 rpointer.rof.0021-01-05-00000
ctsm53062_f09_1850.clm2.r.0021-01-05-00000.nc ctsm53062_f09_1850.mosart.rh0a.0021-01-05-00000.nc rpointer.atm.0021-01-04-00000 rpointer.lnd.0016-01-01-00000 rpointer.rof.0006-01-01-00000 rpointer.rof.0021-01-06-00000
ctsm53062_f09_1850.clm2.rh0a.0021-01-05-00000.nc ctsm53062_f09_1850.mosart.rh0i.0021-01-05-00000.nc rpointer.atm.0021-01-05-00000 rpointer.lnd.0021-01-01-00000 rpointer.rof.0011-01-01-00000 rpointer.rof.0021-01-07-00000
ctsm53062_f09_1850.clm2.rh0i.0021-01-05-00000.nc rpointer.atm.0006-01-01-00000 rpointer.atm.0021-01-06-00000 rpointer.lnd.0021-01-02-00000 rpointer.rof.0016-01-01-00000
ctsm53062_f09_1850.cpl.r.0021-01-05-00000.nc rpointer.atm.0011-01-01-00000 rpointer.atm.0021-01-07-00000 rpointer.lnd.0021-01-03-00000 rpointer.rof.0021-01-01-00000
ctsm53062_f09_1850.datm.r.0021-01-05-00000.nc rpointer.atm.0016-01-01-00000 rpointer.cpl.0021-01-05-00000 rpointer.lnd.0021-01-04-00000 rpointer.rof.0021-01-02-00000
ctsm53062_f09_1850.mosart.h0a.0020-12.nc rpointer.atm.0021-01-01-00000 rpointer.lnd.0001-01-01-01800 rpointer.lnd.0021-01-05-00000 rpointer.rof.0021-01-03-00000
0021-01-06-00000:
ctsm53062_f09_1850.clm2.h0a.0020-12.nc ctsm53062_f09_1850.clm2.rh0i.0021-01-06-00000.nc ctsm53062_f09_1850.mosart.h0i.0020-12.nc rpointer.atm 'rpointer.lnd$NINST_STRING'
ctsm53062_f09_1850.clm2.h0i.0020-12.nc ctsm53062_f09_1850.cpl.r.0021-01-06-00000.nc ctsm53062_f09_1850.mosart.r.0021-01-06-00000.nc 'rpointer.atm$NINST_STRING' rpointer.rof
ctsm53062_f09_1850.clm2.r.0021-01-06-00000.nc ctsm53062_f09_1850.datm.r.0021-01-06-00000.nc ctsm53062_f09_1850.mosart.rh0a.0021-01-06-00000.nc rpointer.cpl.0021-01-06-00000 'rpointer.rof$NINST_STRING'
ctsm53062_f09_1850.clm2.rh0a.0021-01-06-00000.nc ctsm53062_f09_1850.mosart.h0a.0020-12.nc ctsm53062_f09_1850.mosart.rh0i.0021-01-06-00000.nc rpointer.lnd
0021-01-07-00000:
ctsm53062_f09_1850.clm2.h0a.0020-12.nc ctsm53062_f09_1850.clm2.rh0a.0021-01-07-00000.nc ctsm53062_f09_1850.datm.r.0021-01-07-00000.nc ctsm53062_f09_1850.mosart.r.0021-01-07-00000.nc rpointer.cpl.0021-01-07-00000
ctsm53062_f09_1850.clm2.h0i.0020-12.nc ctsm53062_f09_1850.clm2.rh0i.0021-01-07-00000.nc ctsm53062_f09_1850.mosart.h0a.0020-12.nc ctsm53062_f09_1850.mosart.rh0a.0021-01-07-00000.nc
ctsm53062_f09_1850.clm2.r.0021-01-07-00000.nc ctsm53062_f09_1850.cpl.r.0021-01-07-00000.nc ctsm53062_f09_1850.mosart.h0i.0020-12.nc ctsm53062_f09_1850.mosart.rh0i.0021-01-07-00000.nc
01-07 seems to be missing numerous rpointer files.
01-06 has rpointer files with strange suffixes.
- In .../run I see an empty rpointer.lnd and the following error in cesm.log:
end-of-file during read, unit 99, file /glade/derecho/scratch/slevis/ctsm53062_f09_1850/run/./rpointer.lnd
@adrifoster and others please add information that you consider helpful.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status