-
Notifications
You must be signed in to change notification settings - Fork 58
Description
I get the following error on Derecho with PIO2 if the total number of tasks is a little short. My particular test case has 8 iotasks and stride of 4. If the root iotask is 0, then the iotasks would be mpi tasks 0,4,8,12,16,20,24,28. If the root iotask is 1, then the iotasks would be mpi tasks 1,5,9,13,17,21,25,29. The former should run fine with 29 total tasks. The latter with 30. Both run fine with 32 tasks, but error (as below) with 31 or 30 tasks. This error occurs for all format types (cdf1, cdf2, cdf5, hdf5) and with netcdf or pnetcdf. It also happens with all compilers (i.e. intel, gnu, cray). PIO1 works fine.
Testing was done on Derecho in Feb, 2024 with CICE using
module load parallelio/1.10.1
or
module load parallelio/2.6.1
The error looks like
Obtained 10 stack frames.
/glade/u/apps/derecho/23.06/spack/opt/spack/parallelio/2.6.1/cray-mpich/8.1.25/oneapi/2023.0.0/jxom/lib/libpioc.so(pio_err+0x80) [0x149094d7c180]
/glade/u/apps/derecho/23.06/spack/opt/spack/parallelio/2.6.1/cray-mpich/8.1.25/oneapi/2023.0.0/jxom/lib/libpioc.so(PIOc_Init_Intracomm+0xc9) [0x149094d84cf9]
/glade/u/apps/derecho/23.06/spack/opt/spack/parallelio/2.6.1/cray-mpich/8.1.25/oneapi/2023.0.0/jxom/lib/libpioc.so(PIOc_Init_Intracomm_from_F90+0x14) [0x149094d851d4]
/glade/u/apps/derecho/23.06/spack/opt/spack/parallelio/2.6.1/cray-mpich/8.1.25/oneapi/2023.0.0/jxom/lib/libpiof.so(piolib_mod_mp_init_intracom_+0xd6) [0x149094fcb746]
/var/run/palsd/3d07ccb2-c26d-421d-8a14-427ee69dfcbc/files/cice() [0x13e82ef]
MPICH ERROR [Rank 0] [job id 3d07ccb2-c26d-421d-8a14-427ee69dfcbc] [Tue Feb 20 11:53:34 2024] [dec2097] - Abort(-1) (rank 0 in comm 0): application called MPI_Abort(
MPI_COMM_WORLD, -1) - process 0