-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Example:
mpiexec --np 768 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/8/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-13_12-50-22/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gather -m 4096:4096 -i 1000 -x 100 -f -z -d sycl
x4101c2s0b0n0.hsn.cm.aurora.alcf.anl.gov: rank 256 died from signal 11
x4117c4s0b0n0.hsn.cm.aurora.alcf.anl.gov: rank 32 died from signal 15
module load mpich-config/collective-tuning/1024 ; MPIR_CVAR_CH4_PROGRESS_THROTTLE=1 ; mpiexec --np 768 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/8/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-13_12-50-22/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gather -m 4096:4096 -i 1000 -x 100 -f -z -d sycl ; unset MPIR_CVAR_CH4_PROGRESS_THROTTLE ; module unload mpich-config/collective-tuning/1024
x4101c2s0b0n0.hsn.cm.aurora.alcf.anl.gov: rank 256 died from signal 11
x4103c7s2b0n0.hsn.cm.aurora.alcf.anl.gov: rank 434 died from signal 15