-
Notifications
You must be signed in to change notification settings - Fork 497
Description
I am using OpenUCX on a Linux server for RDMA communication, employing the UCP API along with RC and Active Message. My server has only two network cards, both from Mellanox. These two network cards are bonded in mode 4, specifically using the layer3+4 algorithm, and the server has only one external IP address.
In my program, the server uses the listen function to establish new endpoints (EPs) for data transmission. The EPs are evenly distributed across 8 different single-threaded workers (each thread has one UCP worker). On the client side, there are 8 worker threads (each thread has one UCP worker) that initiate connections and send data.
During performance testing, I found that only one network card is fully utilized, specifically the first one. However, when I use the ucx_perftest tool (with the ucp_am_bw test type) under the same environment variables, ucx_perftest is able to utilize both network cards in the bond. Why is this happening? My environment variables are configured as follows:
export UCX_IB_ROCE_REACHABILITY_MODE=all
export UCX_IB_TRAFFIC_CLASS=160
export UCX_MAX_RMA_RAILS=2
export UCX_TLS=rc
export UCX_NET_DEVICES=mlx5_bond_0:1
use v1.19.0