-
Notifications
You must be signed in to change notification settings - Fork 497
Description
Environment
MPI: Open MPI (pml ucx)
UCX: v1.19.0
Benchmark: osu-micro-benchmarks/mpi/pt2pt/osu_latency
Node count: 2
OS: Linux
Transport: posix
When running the OSU latency test with UCX_TLS=posix under MPI+UCX, I observe that latency stays flat before 64 bytes, but starts to grow sharply once the message size exceeds ~64B.
According to UCX’s design, this transition point corresponds to exceeding fifo_elem_size and switching to the bcopy path.
However, even when the message sizes remain well below seg_size, the latency still continues to increase as message size grows.
This behavior seems unexpected: if UCX is using bcopy and the message fits within a segment, I would expect latency to remain nearly flat.
I'm trying to understand whether this is expected behavior or a performance issue.
here is my command:
mpirun --allow-run-as-root -np 2 \
-x UCX_TLS=posix \
--mca pml ucx \
--mca coll_ucc_enable 1 \
--mca coll_ucc_priority 100 \
--mca pml_ucx_tls any \
--mca pml_ucx_devices any \
/opt/mpitest/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency
and there is the result:
# OSU MPI Latency Test v7.5
# Datatype: MPI_CHAR.
# Size Avg Latency(us)
1 0.94
2 0.93
4 0.92
8 0.93
16 0.92
32 1.04
64 1.05 ---- prepare to change to bcopy
128 1.73 |
256 1.79 |
512 2.80 | Why did the latency increase in this range,
1024 3.47 | while the short-am msg latency did not increase in its own range?
2048 4.45 |
4096 6.26 |
8192 9.45 ----- more than size of one bcopy
16384 16.86
32768 31.13
65536 62.73
131072 126.36
262144 294.13
524288 542.43
1048576 1007.92
2097152 1927.02
4194304 3754.30