Skip to content

Performance regression in atmospheric model when using OpenMPI – traced to slow memset #13340

@yuhao0102

Description

@yuhao0102

When running our atmospheric model in parallel we observe a significant performance drop with OpenMPI compared to Intel MPI + Intel compiler. Profiling shows that almost all of the extra time is spent inside memset.
Observations
With Intel MPI + Intel compiler the code runs at expected speed.
With OpenMPI (regardless of which compiler is used to build OpenMPI and the model) the same executable becomes ≥ 2× slower, and the profiler attributes the loss almost entirely to memset.
The problem persists across several recent OpenMPI releases (tested 4.1.x and 5.0.x).
Steps to reproduce
Build the atmospheric model with any compiler (Intel or LLVM) against Intel MPI → run time ≈ T₀ (baseline).
Re-build the identical source against OpenMPI (any recent version) → run time ≈ (2~4)× T₀.
Profile (perf, VTune, or gprof) shows > 80 % of the extra time is consumed by memset.
Environment
OS: RHEL 7
Compilers tested: Intel 2021.11, LLVM 17
MPIs tested:
– Intel MPI 2021.11
– OpenMPI 4.1.6, 5.0.1 (both built from source and distro packages)
Expected behavior
memset cost should remain small regardless of MPI implementation, so OpenMPI performance should match Intel MPI.
Actual behavior
memset becomes the bottleneck under OpenMPI.
Additional notes
No special memset tuning flags are used in either case.

Please let me know if you need any additional information (build options, reproducer, or profiling data).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions