Skip to content

Commit 5e84157

Browse files
committed
ch4: shm: fix data type for recv_bytes in MPIDI_POSIX_mpi_release_gather_release
The number of received bytes in release_gather_release is badly cast between int and MPI_Aint. On most arch this is not an issue, but for Big-Endian 64b arch (s390x) it ends up losing the actual value as we only copy the first 4 MSB. Fix the issue by writing the whole MPI_AInt in the shm_buf instead of just an int. Signed-off-by: Nicolas Morey <[email protected]>
1 parent 6e5a2ad commit 5e84157

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

src/mpid/ch4/shm/posix/release_gather/release_gather.h

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ MPL_STATIC_INLINE_PREFIX int MPIDI_POSIX_mpi_release_gather_release(void *local_
124124
datatype, root, MPIR_BCAST_TAG, comm_ptr, &status);
125125
MPIR_ERR_CHECK(mpi_errno);
126126
MPIR_Get_count_impl(&status, MPIR_BYTE_INTERNAL, &recv_bytes);
127-
MPIR_Typerep_copy(bcast_data_addr, &recv_bytes, sizeof(int),
127+
MPIR_Typerep_copy(bcast_data_addr, &recv_bytes, sizeof(MPI_Aint),
128128
MPIR_TYPEREP_FLAG_NONE);
129129
/* It is necessary to copy the coll_attr as well to handle the case when non-root
130130
* becomes temporary root as part of compositions (or smp aware colls). These temp
@@ -149,7 +149,7 @@ MPL_STATIC_INLINE_PREFIX int MPIDI_POSIX_mpi_release_gather_release(void *local_
149149
/* When error checking is enabled, place the datasize in shm_buf first, followed by the
150150
* coll_attr, followed by the actual data with an offset of (2*cacheline_size) bytes from
151151
* the starting address */
152-
MPIR_Typerep_copy(bcast_data_addr, &count, sizeof(int), MPIR_TYPEREP_FLAG_NONE);
152+
MPIR_Typerep_copy(bcast_data_addr, &count, sizeof(MPI_Aint), MPIR_TYPEREP_FLAG_NONE);
153153
/* It is necessary to copy the coll_attr as well to handle the case when non-root
154154
* becomes root as part of compositions (or smp aware colls). These roots might
155155
* expect same data as other ranks but different from the actual root. So only
@@ -221,8 +221,9 @@ MPL_STATIC_INLINE_PREFIX int MPIDI_POSIX_mpi_release_gather_release(void *local_
221221
* datasize is copied out from shm_buffer and compared against the count a rank was
222222
* expecting. Also, the coll_attr is copied out. In case of mismatch mpi_errno is set.
223223
* Actual data starts after (2*cacheline_size) bytes */
224-
int recv_bytes, recv_errflag;
225-
MPIR_Typerep_copy(&recv_bytes, bcast_data_addr, sizeof(int), MPIR_TYPEREP_FLAG_NONE);
224+
MPI_Aint recv_bytes;
225+
int recv_errflag;
226+
MPIR_Typerep_copy(&recv_bytes, bcast_data_addr, sizeof(MPI_Aint), MPIR_TYPEREP_FLAG_NONE);
226227
MPIR_Typerep_copy(&recv_errflag, (char *) bcast_data_addr + MPIDU_SHM_CACHE_LINE_LEN,
227228
sizeof(int), MPIR_TYPEREP_FLAG_NONE);
228229
MPIR_ERR_CHKANDJUMP2(recv_bytes != count, mpi_errno, MPI_ERR_OTHER,

0 commit comments

Comments
 (0)