CH4 performance worse than CH3 in 2-process ping-pong benchmark #7592

AhmeddHanyy · 2025-09-24T10:21:15Z

AhmeddHanyy
Sep 24, 2025

I have been testing a simple ping-pong communication benchmark with 2 processes using MPICH.

When I compare MPICH 4.2.2 with CH3 vs CH4, I see that CH3 consistently outperforms CH4 by ~1.5x.

Observations

CH3 (mpich-4.2.2-ch3):
Better latency and throughput in my ping-pong test.

CH4 (mpich-4.2.2 default build, with libfabric):
By default, it selects sockets as the provider.
With sockets → performance is worse than CH3.
With FI_PROVIDER=shm (forcing shared memory provider) → performance improves, but still worse than CH3.

My expectation

From the documentation, I believed that CH4 should provide equal or better performance compared to CH3, especially in the shared-memory 2-process case.

Question
Is this performance difference expected?

hzhou · 2025-09-24T14:46:58Z

hzhou
Sep 24, 2025
Maintainer

Could you provide your data and setup details? Are you testing intranode latency?

2 replies

AhmeddHanyy Sep 24, 2025
Author

Here's my app:
ping_pong.c

Machine specs:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 12
Socket(s):           1
NUMA node(s):        1
CPU family:          25
Model:               1
Model name:          AMD EPYC 7443 24-Core Processor

MPICH Config Options for CH4
"--enable-static --disable-shared --disable-pci --disable-fortran --disable-f77 --disable-fc --disable-f90modules --disable-cxx --enable-fast=nochkmsg --enable-fast=notiming --enable-fast=ndebug --enable-fast=O3 --disable-libudev --enable-efa=no"

Yes, I'm testing intranode latency

AhmeddHanyy Oct 5, 2025
Author

I want to add that I tried using mpich-4.3.2rc2. On x86_64 platform, CH4 is performing good, but when tested on ARM platform CH3 is giving better performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CH4 performance worse than CH3 in 2-process ping-pong benchmark #7592

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CH4 performance worse than CH3 in 2-process ping-pong benchmark #7592

Uh oh!

AhmeddHanyy Sep 24, 2025

Replies: 1 comment · 2 replies

Uh oh!

hzhou Sep 24, 2025 Maintainer

Uh oh!

Uh oh!

AhmeddHanyy Sep 24, 2025 Author

Uh oh!

AhmeddHanyy Oct 5, 2025 Author

AhmeddHanyy
Sep 24, 2025

Replies: 1 comment 2 replies

hzhou
Sep 24, 2025
Maintainer

AhmeddHanyy Sep 24, 2025
Author

AhmeddHanyy Oct 5, 2025
Author