Performance regression in atmospheric model when using OpenMPI – traced to slow memset

When running our atmospheric model in parallel we observe a significant performance drop with OpenMPI compared to Intel MPI + Intel compiler. Profiling shows that almost all of the extra time is spent inside memset.
**Observations**
With Intel MPI + Intel compiler the code runs at expected speed.
With OpenMPI (regardless of which compiler is used to build OpenMPI and the model) the same executable becomes ≥ 2× slower, and the profiler attributes the loss almost entirely to memset.
The problem persists across several recent OpenMPI releases (tested 4.1.x and 5.0.x).
**Steps to reproduce**
Build the atmospheric model with any compiler (Intel or LLVM) against Intel MPI → run time ≈ T₀ (baseline).
Re-build the identical source against OpenMPI (any recent version) → run time ≈ (2~4)× T₀.
Profile (perf, VTune, or gprof) shows > 80 % of the extra time is consumed by memset.
**Environment**
OS: RHEL 7
Compilers tested: Intel 2021.11, LLVM 17
MPIs tested:
– Intel MPI 2021.11
– OpenMPI 4.1.6, 5.0.1 (both built from source and distro packages)
**Expected behavior**
memset cost should remain small regardless of MPI implementation, so OpenMPI performance should match Intel MPI.
**Actual behavior**
memset becomes the bottleneck under OpenMPI.
**Additional notes**
No special memset tuning flags are used in either case.

Please let me know if you need any additional information (build options, reproducer, or profiling data).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regression in atmospheric model when using OpenMPI – traced to slow memset #13340

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance regression in atmospheric model when using OpenMPI – traced to slow memset #13340

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions