-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Hello,
I currently try to use rr (record and replay) in a larger project on some MPI nodes to debug some challenging problems. I have used rr in this project before on a single node - but the parallel replay gives me some headaches.
I already opened a bug report at the rr project, but after some investigation I'm not sure if I face here a MPICH problem. Originally I try to use rr with Intel MPI, but in my tests MPICH showed the same issues:
I have this simple MPI test program:
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
std::cout << "Hello from rank " << rank << " of " << size << std::endl;
MPI_Finalize();
return 0;
}
That I compiled with:
mpicxx -O0 -g -std=c++17 hello.cpp -o mpi_hello
I compiled MPICH 4.3.2 with --enable-g=all and tried to record each node of a parallel run. Since rr has issues with shm, I tried to disable it:
$ export UCX_TLS=tcp,self
$ export UCX_NET_DEVICES=eth0
$ export UCX_LOG_LEVEL=DEBUG
$ mpiexec -n 2 rr record -n ./mpi_hello > mpich-debug-out.txt 2>&1
Here is the MPICH output:
mpich-debug-out.txt
It confirms that TCP is being used:
created interface[...] using tcp/eth0 on worker
tcp_iface ... listening for connections (fd=22) on 192.168.178.60:33531 netif eth0
We see UCX probing for ib devices, but none are found:
ib_md.c:847 UCX DEBUG no devices are found
failed to open rdmacm ... No such device
UCX initially loads shared-memory capable memory domains sysv, posix, and cma, but then discards them:
ucp_context.c:1661 closing md sysv because it has no selected transport resources
closing md posix because it has no selected transport resources
closing md cma because it has no selected transport resources
If I try to replay one of the recorded nodes, it indefinitely hangs. Here is the full output of the replay:
$ RR_LOG=all rr replay -a mpi_hello-0 :
rr-mpich-replay.txt
As well as a rr dump:
$ rr dump -p -m -b mpi_hello-0:
rr-mpich-dump.txt
The backtrace at the hang is:
>~"#0 MPIDU_Init_shm_init () at ../src/mpid/common/shm/mpidu_init_shm.c:203\n"
>~"#1 0x00007fa2b8ff5639 in MPIDI_world_pre_init () at ../src/mpid/ch4/src/ch4_init.c:661\n"
>~"#2 0x00007fa2b8ff995a in MPID_Comm_commit_pre_hook (comm=comm@entry=0x7fa2b97fc420 <MPIR_Comm_builtin>) at ../src/mpid/ch4/src/ch4_comm.c:150\n"
>~"#3 0x00007fa2b8f3030d in MPIR_Comm_commit_internal (comm=comm@entry=0x7fa2b97fc420 <MPIR_Comm_builtin>) at ../src/mpi/comm/commutil.c:578\n"
>~"#4 0x00007fa2b8f313b1 in MPIR_Comm_commit (comm=0x7fa2b97fc420 <MPIR_Comm_builtin>) at ../src/mpi/comm/commutil.c:793\n"
>~"#5 0x00007fa2b8f2dfaa in MPIR_init_comm_world () at ../src/mpi/comm/builtin_comms.c:33\n"
>~"#6 0x00007fa2b8f68b05 in MPII_Init_thread (argc=argc@entry=0x7ffcbf78731c, argv=argv@entry=0x7ffcbf787310, user_required=<optimized out>, provided=provided@entry=0x7ffcbf7872ac, p_session_ptr=p_session_ptr@entry=0x0) at ../src/mpi/init/mpir_init.c:281\n"
>~"#7 0x00007fa2b8f68f93 in MPIR_Init_impl (argc=argc@entry=0x7ffcbf78731c, argv=argv@entry=0x7ffcbf787310) at ../src/mpi/init/mpir_init.c:146\n"
>~"#8 0x00007fa2b8d326ed in internal_Init (argc=0x7ffcbf78731c, argv=0x7ffcbf787310) at ../src/binding/c/init/init.c:57\n"
>~"#9 PMPI_Init (argc=0x7ffcbf78731c, argv=0x7ffcbf787310) at ../src/binding/c/init/init.c:108\n"
>~"#10 0x000055f1a007f20c in main (argc=1, argv=0x7ffcbf787458) at hello.cpp:13\n"
In the debugger I could see that I am stuck in this while loop, that I never leave.
So it seems to be indeed shm related - despite that I disabled it to the best of my knowledge with env vars.
This is the same issue, that I see with Intel MPI, also here I disabled shm via:
$ mpirun \
-genv I_MPI_FABRICS ofi \
-genv I_MPI_OFI_PROVIDER tcp \
-genv I_MPI_SHM off \
-genv FI_TCP_IFACE eth0 \
-genv I_MPI_DEBUG 4 \
-np 2 \
rr ./mpi_hello
Out of curiosity I tried the deprecated ch3 setup, that is e.g. described here and here. So I compiled MPICH with --with-device=ch3:nemesis, compiled my program as above and started it with:
$ export MPICH_NO_LOCAL=1
$ mpiexec -n 2 rr record -n ./mpi_hello
And this time replaying works just fine!
I debugged the MPI init a little bit and for the ch3 version we init in:
>~"#0 MPIDI_CH3I_Comm_commit_pre_hook (comm=comm@entry=0x7f6d2c5bd040 <MPIR_Comm_builtin>) at ../src/mpid/ch3/src/ch3u_comm.c:185\n"
>~"#1 0x00007f6d2c2a23cd in MPIR_Comm_commit_internal (comm=comm@entry=0x7f6d2c5bd040 <MPIR_Comm_builtin>) at ../src/mpi/comm/commutil.c:578\n"
>~"#2 0x00007f6d2c2a52e1 in MPIR_Comm_commit (comm=0x7f6d2c5bd040 <MPIR_Comm_builtin>) at ../src/mpi/comm/commutil.c:793\n"
>~"#3 0x00007f6d2c2a031a in MPIR_init_comm_world () at ../src/mpi/comm/builtin_comms.c:33\n"
>~"#4 0x00007f6d2c2e0668 in MPII_Init_thread (argc=argc@entry=0x7ffec2bc5f9c, argv=argv@entry=0x7ffec2bc5f90, user_required=<optimized out>, provided=provided@entry=0x7ffec2bc5f2c, p_session_ptr=p_session_ptr@entry=0x0) at ../src/mpi/init/mpir_init.c:281\n"
>~"#5 0x00007f6d2c2e0d83 in MPIR_Init_impl (argc=argc@entry=0x7ffec2bc5f9c, argv=argv@entry=0x7ffec2bc5f90) at ../src/mpi/init/mpir_init.c:146\n"
>~"#6 0x00007f6d2c14eb2d in internal_Init (argc=0x7ffec2bc5f9c, argv=0x7ffec2bc5f90) at ../src/binding/c/init/init.c:57\n"
>~"#7 PMPI_Init (argc=0x7ffec2bc5f9c, argv=0x7ffec2bc5f90) at ../src/binding/c/init/init.c:108\n"
>~"#8 0x0000560dc461920c in main (argc=1, argv=0x7ffec2bc60d8) at hello.cpp:13\n"
and leave this stack without issues.
Maybe anybody here can help me? It seems that with my CH4 UCX setup SHM is being used, despite that I disabled it via environment variables and this is a problem for rr. Are there maybe ways, how I can avoid this (or did I overlook an environment variable to achieve this?).
Also I know this is MPICH, but originally I have to use Intel MPI in my project, so it would be great if someone could also give me some advice or tips for that.
Thank you very much and many greetings
Mathias