Skip to content

Only one network card is used for the bond network card #10962

@carson-yang

Description

@carson-yang

I am using OpenUCX on a Linux server for RDMA communication, employing the UCP API along with RC and Active Message. My server has only two network cards, both from Mellanox. These two network cards are bonded in mode 4, specifically using the layer3+4 algorithm, and the server has only one external IP address.

In my program, the server uses the listen function to establish new endpoints (EPs) for data transmission. The EPs are evenly distributed across 8 different single-threaded workers (each thread has one UCP worker). On the client side, there are 8 worker threads (each thread has one UCP worker) that initiate connections and send data.

During performance testing, I found that only one network card is fully utilized, specifically the first one. However, when I use the ucx_perftest tool (with the ucp_am_bw test type) under the same environment variables, ucx_perftest is able to utilize both network cards in the bond. Why is this happening? My environment variables are configured as follows:

export UCX_IB_ROCE_REACHABILITY_MODE=all
export UCX_IB_TRAFFIC_CLASS=160
export UCX_MAX_RMA_RAILS=2
export UCX_TLS=rc
export UCX_NET_DEVICES=mlx5_bond_0:1

use v1.19.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions