-
Notifications
You must be signed in to change notification settings - Fork 497
Description
Describe the bug
When upgrading from UCX 1.18 to UCX 1.19, we see a performance regression for single-node runs for the AMD HPL-MxP benchmark on our HPC system Viper. We use UCX_PROTO_ENABLE=n. A similar regression is seen for single-node runs of the mbw benchmark of the OSU microbenchmark suite for large message sizes. Other multi-node application runs are fine with this configuration.
The performance of the HPL-MxP benchmark is:
- UCX 1.18: 22216.5 GFlops
- UCX 1.19: 16830 GFlops
The bandwidth for the mbw-mr test at a message size of 1MB is:
- UCX 1.18: about 1 TB/s
- UCX 1.19: about 50 GB/s
With UCX_PROTO_ENABLE=y, we see the same performance as with 1.18. With the new protocol selection, however, we see performance issues with multi-node applications (I'll open a separate issue in the coming days), so we don't want to use that. Is there maybe an environment variable that allows us to get back the behavior from 1.18?
Steps to Reproduce
- HPL-MxP
- download the binary from the AMD web page
- start with latest OpenMPI (5.0.8) and UCX 1.18 or 1.19
- compare value of "overall performance"
- OSU microbenchmarks mbw
- get OSU microbenchmarks from https://mvapich.cse.ohio-state.edu/benchmarks/
- run osu_mbw_mr
- compare bandwidth for large message sizes
- Configure line: (same for both versions except the prefix)
# Configured with: --prefix=/mpcdf/soft/RHEL_9/packages/znver4/UCX/gcc_14-14.1.0/1.18.0 --enable-optimizations --enable-mt --enable-cma --enable-compiler-opt=2 --disable-tuning --enable-builtin-memcpy --with-cache-line-size=64 --without-go --disable-logging --disable-debug --disable-assertions --disable-params-check --with-rc --with-dc --with-ud --with-mlx5-dv --with-verbs --with-xpmem --with-dm --with-devx --with-avx --with-bfd --with-rdmacm --without-xpmem- environment variables:
UCX_PROTO_ENABLE=n
UCX_UNIFIED_MODE=y
UCX_TLS=^ud,tcp
- hardware: AMD Epyc Genoa 9554 + NDR (see documentation)
Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
cat /etc/issueorcat /etc/redhat-release+uname -aRed Hat Enterprise Linux release 9.4 (Plow),Linux viper01 5.14.0-427.94.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Oct 8 06:57:38 EDT 2025 x86_64 x86_64 x86_64 GNU/Linux
- For RDMA/IB/RoCE related issues:
- Driver version:
rpm -q rdma-coreorrpm -q libibverbs- or: MLNX_OFED version
ofed_info -s MLNX_OFED_LINUX-24.10-2.1.8.0:
- HW information from
ibstatoribv_devinfo -vvcommand
- Driver version:
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.43.2566
Hardware version: 0
Node GUID: 0xe8ebd303008ea986
System image GUID: 0xe8ebd303008ea986
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 42
LMC: 0
SM lid: 1
Capability mask: 0xa651e848
Port GUID: 0xe8ebd303008ea986
Link layer: InfiniBand
CA 'mlx5_bond_0'
CA type: MT4127
Number of ports: 1
Firmware version: 26.39.3560
Hardware version: 0
Node GUID: 0x946dae03006d8074
System image GUID: 0x946dae03006d8074
Port 1:
State: Active
Physical state: LinkUp
Rate: 25
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0x966daefffe6d8074
Link layer: Ethernet
Additional information (depending on the issue)
- OpenMPI version: 5.0.8
- Output of
ucx_info -dto show transports and devices recognized by UCX - 1.19:
#
# Memory domain: self
# Component: self
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
# rkey_ptr is supported
# memory types: host (access,reg_nonblock,reg,cache)
#
# Transport: self
# Device: memory
# Type: loopback
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 19360.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8K
# am_bcopy: <= 8K
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: tcp
# Component: tcp
# memory types:
#
# Transport: tcp
# Device: bond0
# Type: network
# System device: bond0 (0)
#
# capabilities:
# bandwidth: 2200.00/ppn + 0.00 MB/sec
# latency: 5212 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 0
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: eno1
# Type: network
# System device: eno1 (1)
#
# capabilities:
# bandwidth: 113.16/ppn + 0.00 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: ib0
# Type: network
# System device: ib0 (2)
#
# capabilities:
# bandwidth: 2200.00/ppn + 0.00 MB/sec
# latency: 5203 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: lo
# Type: network
# System device: <unknown>
#
# capabilities:
# bandwidth: 11.91/ppn + 0.00 MB/sec
# latency: 10960 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 18 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
# max_conn_priv: 2064 bytes
#
# Memory domain: sysv
# Component: sysv
# allocate: unlimited
# remote key: 12 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: sysv
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: posix
# Component: posix
# allocate: <= 263712132K
# remote key: 24 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: posix
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: mlx5_0
# Component: ib
# allocate: <= 256K
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache), rdma (alloc,cache)
#
# Transport: rc_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 7 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 3992
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Transport: dc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 660 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure
#
#
# Transport: rc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 10 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
# Transport: ud_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 132
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Memory domain: mlx5_bond_0
# Component: ib
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache)
#
# Transport: rc_verbs
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 800 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 45
# device num paths: 2
# max eps: 256
# device address: 18 bytes
# ep address: 7 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 830 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 1016
# am_zcopy: <= 1016, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 920
# connection: to ep, to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Transport: dc_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 860 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure
#
#
# Transport: rc_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 800 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 45
# device num paths: 2
# max eps: 256
# device address: 18 bytes
# ep address: 10 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
# Transport: ud_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 830 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 1016
# am_zcopy: <= 1016, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 132
# connection: to ep, to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Connection manager: rdmacm
# max_conn_priv: 54 bytes
#
# Memory domain: cma
# Component: cma
# memory types:
#
# Transport: cma
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 11145.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: peer failure, ep_check
#
- 1.18:
#
# Memory domain: self
# Component: self
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
# rkey_ptr is supported
# memory types: host (access,reg_nonblock,reg,cache)
#
# Transport: self
# Device: memory
# Type: loopback
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 19360.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8K
# am_bcopy: <= 8K
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: tcp
# Component: tcp
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
# memory types: host (access,reg_nonblock,reg,cache)
#
# Transport: tcp
# Device: bond0
# Type: network
# System device: bond0 (0)
#
# capabilities:
# bandwidth: 2200.00/ppn + 0.00 MB/sec
# latency: 5212 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 0
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: eno1
# Type: network
# System device: eno1 (1)
#
# capabilities:
# bandwidth: 113.16/ppn + 0.00 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: ib0
# Type: network
# System device: ib0 (2)
#
# capabilities:
# bandwidth: 2200.00/ppn + 0.00 MB/sec
# latency: 5203 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: lo
# Type: network
# System device: <unknown>
#
# capabilities:
# bandwidth: 11.91/ppn + 0.00 MB/sec
# latency: 10960 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 18 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
# max_conn_priv: 2064 bytes
#
# Memory domain: sysv
# Component: sysv
# allocate: unlimited
# remote key: 12 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: sysv
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: posix
# Component: posix
# allocate: <= 263712132K
# remote key: 24 bytes
# rkey_ptr is supported
# memory types: host (access,alloc,cache)
#
# Transport: posix
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 15360.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: mlx5_0
# Component: ib
# allocate: <= 256K
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache), rdma (alloc,cache)
#
# Transport: rc_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 7 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 3992
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Transport: dc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 660 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure
#
#
# Transport: rc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 10 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
# Transport: ud_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (2)
#
# capabilities:
# bandwidth: 23588.47/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 132
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Memory domain: mlx5_bond_0
# Component: ib
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache)
#
# Transport: rc_verbs
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 800 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 45
# device num paths: 2
# max eps: 256
# device address: 18 bytes
# ep address: 7 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 830 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 1016
# am_zcopy: <= 1016, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 920
# connection: to ep, to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Transport: dc_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 860 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure
#
#
# Transport: rc_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 800 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 1K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 1K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 45
# device num paths: 2
# max eps: 256
# device address: 18 bytes
# ep address: 10 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
#
#
# Transport: ud_mlx5
# Device: mlx5_bond_0:1
# Type: network
# System device: mlx5_bond_0 (0)
#
# capabilities:
# bandwidth: 5478.92/ppn + 0.00 MB/sec
# latency: 830 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 1016
# am_zcopy: <= 1016, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 1K
# am header: <= 132
# connection: to ep, to iface
# device priority: 45
# device num paths: 2
# max eps: inf
# device address: 18 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Memory domain: mlx5_0
# Component: gga
# allocate: <= 256K
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache), rdma (alloc,cache)
# < no supported devices found >
#
# Memory domain: mlx5_bond_0
# Component: gga
# register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# memory types: host (access,reg,cache)
# < no supported devices found >
#
# Connection manager: rdmacm
# max_conn_priv: 54 bytes
#
# Memory domain: cma
# Component: cma
# register: unlimited, cost: 9 nsec
# memory types: host (access,reg_nonblock,reg,cache)
#
# Transport: cma
# Device: memory
# Type: intra-node
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 11145.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: peer failure, ep_check
#