Skip to content

Fix query gid info#11165

Open
dschervov wants to merge 4 commits intoopenucx:masterfrom
dschervov:fix-query-gid-info
Open

Fix query gid info#11165
dschervov wants to merge 4 commits intoopenucx:masterfrom
dschervov:fix-query-gid-info

Conversation

@dschervov
Copy link

@dschervov dschervov commented Feb 2, 2026

What?

Describe what this PR is doing.

I look at the implementation of function "uct_ib_device_query_gid_info" and see
something strange, almost like someone Copy&Pasted and forget to change
(nothing wrong with that, everyone can make mistake):

ret = ucs_read_file(buf, sizeof(buf) - 1, 1,
                    UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT,
                    dev_name, port_num, gid_index);
if (ret > 0) {
    if (!strncmp(buf, "IB/RoCE v1", 10)) {
        info->roce_info.ver = UCT_IB_DEVICE_ROCE_V1;
    } else if (!strncmp(buf, "RoCE v2", 7)) {
        info->roce_info.ver = UCT_IB_DEVICE_ROCE_V2;
    } else {
        ucs_error("failed to parse gid type '%s' (dev=%s port=%d index=%d)",
                  buf, dev_name, port_num, gid_index);
        return UCS_ERR_INVALID_PARAM;
    }
} else {
    /* WHY NOT RETURN THE ERROR? WE DIDNT READ THE FILE! */
    info->roce_info.ver = UCT_IB_DEVICE_ROCE_V1;
}

So i try to fix the incorrect behaviour of function and find out that the other
functions are too comfortable with input incorrect parameters (gid index) into
"uct_ib_device_query_gid_info", and i fix that too.

Why?

Justification for the PR. If there is an existing issue/bug, please reference it. For
bug fixes, the 'Why?' and 'What?' can be merged into a single item.

I have a host with a RoCE setup and dom0/container network namespaces. When
i try to debug some problem, i look at the UCX debug logs and start to see some
strange behaviour...

The gid elemets from my netns is correctly hashed:

ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fe80::b7ce:f610:a4b2:dfd4 to hash on device mlx5_0 port 1 index 8)
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fe80::b7ce:f610:a4b2:dfd4 to hash on device mlx5_0 port 1 index 9)
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fc01:0:3:0:b7ce:f610:a4b2:dfd4 to hash on device mlx5_0 port 1 index 10)
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fc01:0:3:0:b7ce:f610:a4b2:dfd4 to hash on device mlx5_0 port 1 index 11)

The gid elemets that are not existed is correctly discarded:

sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/12: Invalid argument
sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/13: Invalid argument

BUT FOR SOME REASON gids from all other net namespaces are hashed too! Even thou they are not accessible for me (MELLANOX driver have verify function for it):

sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/0: Invalid argument
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fe80::bace:f6ff:feb2:ffd4 to hash on device mlx5_0 port 1 index 0)
sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/1: Invalid argument
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fe80::bace:f6ff:feb2:ffd4 to hash on device mlx5_0 port 1 index 1)
sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/2: Invalid argument
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fc01:0:3:0:bace:f6ff:feb2:ffd4 to hash on device mlx5_0 port 1 index 2)
sys.c:440  UCX  DEBUG failed to read from /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/3: Invalid argument
ud_iface.c:380  UCX  DEBUG iface 0x56416e89b430: adding gid fc01:0:3:0:bace:f6ff:feb2:ffd4 to hash on device mlx5_0 port 1 index 3)

How?

It is optional, but for complex PRs, please provide information about the design,
architecture, approach, etc.

I think this error are starting from this commit: ad2c408

The old logic was (simplified), if we can "query_gid" ONLY THAN we will configure the roce version:

ret = ibv_query_gid(dev->ibv_context, port_num, gid_index, &info->gid);
if (ret == 0) {
    info->roce_version.major = 1;
    info->roce_version.minor = 0;
    return UCS_OK;
}

return UCS_ERR_INVALID_PARAM;

The MELLANOX driver will not us query gid table element from different namespace.

New logic change it (simplified):

ret = ibv_query_gid(dev->ibv_context, port_num, gid_index, &info->gid);
if (ret == 0) {
    ret = ucs_read_file_str(/* sysfs file: gid_attrs/types */);
    if (ret > 0) {
        if (/* RoCEV1 */) {
            info->roce_version.major = 1;
            info->roce_version.minor = 0;
        } else if (/* RoCEv2 */) {
            info->roce_version.major = 2;
            info->roce_version.minor = 0;
        } else {
            return UCS_ERR_INVALID_PARAM;
        }
    } else {
        /* WHY NOT RETURN THE ERROR? WE DIDNT READ THE FILE! */
        info->roce_version.major = 1; 
        info->roce_version.minor = 0; 
    }
    return UCS_OK;
}

And after that its only become worse, the initial "query_gid" call was
discarded and etc.

Dmitrii Chervov added 4 commits February 2, 2026 23:09
If we have an error when we try to read from
sysfs file, this means we shouldnt return
"UCS_OK", because we didnt get query any
useful information.

This is why if read was unsuccesful we
return "UCS_ERR_INVALID_PARAM".
If we go through all gid table entries to find
accessible gid indexes, we may query gid by
index where is no entry exist, or this index
is in the different netnamespace (RoCE), so
our query call will return "UCX_ERR" status.
We shouldnt exit from function because of it,
this only means that this index is not good
for us, so we should look for the next one.

This is why we continue function execution
even if we cant query gid table entry by
some gid index (we try another).
If we go through all gid table entries to find
accessible gid indexes, we may query gid by
index where is no entry exist, or this index
is in the different netnamespace (RoCE), so
our query call will return "UCX_ERR" status.
We shouldnt exit from function because of it,
this only means that this index is not good
for us, so we should look for the next one.

This is why we continue function execution
even if we cant query gid table entry by
some gid index (we try another).
If user doesnt configure UCX_IB_GID_INDEX
parameter, than function "device_port_check"
will use default value and that gid index
can be from different netnamespace (RoCE).
Which will trigger the error in query gid info
function call. But there is nothing wrong with
the port, it is we, who choose wrong gid index.

This is why, we try to find accessible gid
index by iterating gid table elements, and only
we nothing from it is good for us, only than
we will report the error on port.
@dschervov
Copy link
Author

dschervov commented Feb 3, 2026

I think i understand why my PR cant go through the "EFA_Tests EFA on rhel90_ib"... Its because there is no IB on it, and it used the "dull" version of verbs, like:

int ibv_query_gid(struct ibv_context *context, uint8_t port_num, int index,
                  union ibv_gid *gid)
{
    struct fake_device *fdevice = (struct fake_device*)context->device;
    /* Construct an arbitrary GID based on default GID prefix */
    uint8_t addr[16] = {0xfe,        0x80,          0x00,     0x00, 0x00, 0x00,
                        0x00,        0x00,          0x04,     0x07, 0x64, 0xff,
                        fdevice->id, be_mode_get(), port_num, index};

    (void)context;
    memcpy(gid->raw, addr, sizeof(gid->raw));
    return 0;
}

So of course the "query gid" will fail... And in this scenario its okay, i think this is why the return in this case is "IB/RoCEv1"...

I dont know if my PR be helpfull if this "bug" is like a "feature" now. If the UCX team want to solve it and make it right, i will help with my best efforts!

But if not, maybe it is what it is! Dont want to waste anyone time on something unimportant <3

@dschervov
Copy link
Author

dschervov commented Feb 3, 2026

Maybe we can change "uct_ib_device_query_gid_info()" function from sysfs read, to an ibverbs "query_gid_ex()"? This means that we can implement in a contrib/ibmock/verbs.c?

Transform this (old):

ret = ibv_query_gid(ctx, port_num, gid_index, &info->gid); 
if (ret == 0) {
    ret = ucs_read_file(buf, sizeof(buf) - 1, 1,
                        UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT,
                        dev_name, port_num, gid_index);
    if (ret > 0) {
        if (!strncmp(buf, "IB/RoCE v1", 10)) {
            info->roce_info.ver = UCT_IB_DEVICE_ROCE_V1;
        } else if (!strncmp(buf, "RoCE v2", 7)) {
            info->roce_info.ver = UCT_IB_DEVICE_ROCE_V2;
        } else {
            ucs_error("failed to parse gid type '%s' (dev=%s port=%d index=%d)",
                      buf, dev_name, port_num, gid_index);
            return UCS_ERR_INVALID_PARAM;
        }
    } else {
        return UCS_ERR_INVALID_PARAM;
    }
}

To that (new):

/* This is a wrapper around "ibv_query_gid_ex" */
ret = ib_device_uct_query_gid_ex(ctx, port_num, gid_index, &info);
if (ret != 0)
    return UCS_ERR_INVALID_PARAM;
}

This will make sure that "no ib" test will complete correct, and the real IB tests will run accordingly.

UPD: I check "ibv_query_gid_ex()" function in "rdma_core" and its relatively new (2020). This means that if we use it, people on older systems cant use new UCX : - (

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant