-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hi,
I ran miniFE's ref version with Intel MPI under the message checker from ITAC (Intel Trace Analyzer and Collector). The message checker detected issues LOCAL:MEMORY:OVERLAP and further LOCAL:MEMORY:ILLEGAL_MODIFICATION in ref/src/make_local_matrix.hpp where the same buffers are used for sending and receiving at the same time. From what I saw all other minFE's version should also be affected if they execute the corresponding code.
The affected code from ref/src/make_local_matrix.hpp is in lines 257ff:
std::vector<MPI_Request> request(num_send_neighbors);
for(int i=0; i<num_send_neighbors; ++i) {
MPI_Irecv(&tmp_buffer[i], 1, mpi_dtype, MPI_ANY_SOURCE, MPI_MY_TAG,
MPI_COMM_WORLD, &request[i]);
}
// send messages
for(int i=0; i<num_recv_neighbors; ++i) {
MPI_Send(&tmp_buffer[i], 1, mpi_dtype, recv_list[i], MPI_MY_TAG,
MPI_COMM_WORLD);
}If both loops have a trip count > 0 then some buffers pointed to by the tmp_buffer array are used at the same time for sending and receiving.
The complete output and commands for reproducing:
$ git clone https://github.com/Mantevo/miniFE.git
$ cd miniFE/ref/src
$ # loaded module for intelmpi and itac
$ make
$ mpiexec -check-mpi -n 2 ./miniFE.x
...
creating/filling mesh...0.000828028s, total time: 0.000828981
generating matrix structure...0.00868297s, total time: 0.00951195
assembling FE data...0.00850797s, total time: 0.0180199
imposing Dirichlet BC...0.00221992s, total time: 0.0202398
imposing Dirichlet BC...0.00244904s, total time: 0.0226889
making matrix indices local...
[0] WARNING: LOCAL:MEMORY:OVERLAP: warning
[0] WARNING: New send buffer overlaps with currently active receive buffer at address 0x17f0730.
[0] WARNING: Control over active buffer was transferred to MPI at:
[0] WARNING: MPI_Irecv(*buf=0x17f0730, count=1, datatype=MPI_INT, source=MPI_ANY_SOURCE, tag=99, comm=MPI_COMM_WORLD, *request=0x1c04470)
[0] WARNING: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:259)
[0] WARNING: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[0] WARNING: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[0] WARNING: __libc_start_main (/usr/lib64/libc-2.28.so)
[0] WARNING: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[0] WARNING: Control over new buffer is about to be transferred to MPI at:
[0] WARNING: MPI_Send(*buf=0x17f0730, count=1, datatype=MPI_INT, dest=1, tag=99, comm=MPI_COMM_WORLD)
[0] WARNING: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[0] WARNING: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[0] WARNING: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[0] WARNING: __libc_start_main (/usr/lib64/libc-2.28.so)
[0] WARNING: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[1] WARNING: LOCAL:MEMORY:OVERLAP: warning
[1] WARNING: New send buffer overlaps with currently active receive buffer at address 0x11d48a0.
[1] WARNING: Control over active buffer was transferred to MPI at:
[1] WARNING: MPI_Irecv(*buf=0x11d48a0, count=1, datatype=MPI_INT, source=MPI_ANY_SOURCE, tag=99, comm=MPI_COMM_WORLD, *request=0x1219dc0)
[1] WARNING: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:259)
[1] WARNING: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[1] WARNING: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[1] WARNING: __libc_start_main (/usr/lib64/libc-2.28.so)
[1] WARNING: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[1] WARNING: Control over new buffer is about to be transferred to MPI at:
[1] WARNING: MPI_Send(*buf=0x11d48a0, count=1, datatype=MPI_INT, dest=0, tag=99, comm=MPI_COMM_WORLD)
[1] WARNING: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[1] WARNING: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[1] WARNING: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[1] WARNING: __libc_start_main (/usr/lib64/libc-2.28.so)
[1] WARNING: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
1.09176s, total time: 1.11445
Starting CG solver ...
Initial Residual = 11.0289
Iteration = 20 Residual = 1.23424e-08
Final Resid Norm: 2.06977e-16
[0] INFO: LOCAL:MEMORY:OVERLAP: found 2 times (0 errors + 2 warnings), 0 reports were suppressed
[0] INFO: Found 2 problems (0 errors + 2 warnings), 0 reports were suppressed.If I use more then 2 processes, e.g. 72, then some OVERLAP warnings turn into ILLEGAL_MODIFICATION errors:
[54] ERROR: LOCAL:MEMORY:ILLEGAL_MODIFICATION: error
[54] ERROR: Read-only buffer was modified while owned by MPI.
[54] ERROR: Control over buffer was transferred to MPI at:
[54] ERROR: MPI_Send(*buf=0x9693c4, count=1, datatype=MPI_INT, dest=22, tag=99, comm=MPI_COMM_WORLD)
[54] ERROR: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[54] ERROR: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[54] ERROR: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[54] ERROR: __libc_start_main (/usr/lib64/libc-2.28.so)
[54] ERROR: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[54] ERROR: Modified buffer detected at:
[54] ERROR: MPI_Send(*buf=0x9693c4, count=1, datatype=MPI_INT, dest=22, tag=99, comm=MPI_COMM_WORLD)
[54] ERROR: _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[54] ERROR: _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[54] ERROR: main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[54] ERROR: __libc_start_main (/usr/lib64/libc-2.28.so)
[54] ERROR: _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)