Skip to content

issue: 4219010 XLIO support for kernel 6.10 #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: vNext
Choose a base branch
from

Conversation

tomerdbz
Copy link
Collaborator

Description

Kernel 6.10 netlink has breaked XLIO functionality. Transitioned to libnl - an abstraction that wraps netlink.

This both solves the issue and makes us more robust.

What

Build routing and rules table using libnl instead of netlink.

Why ?

solves https://redmine.mellanox.com/issues/4219010

How ?

Tested on kernel 5.21 and kernel 6.10

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@tomerdbz tomerdbz requested review from svc-codecritic and removed request for iftahl January 5, 2025 17:45
@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Feb 3, 2025

bot:retest

@tomerdbz tomerdbz requested a review from pasis February 3, 2025 08:55
@pasis
Copy link
Member

pasis commented Feb 10, 2025

[question] Will this implementation handle multipath routes? AFAIR, in the previous manual parsing didn't support the RTA_MULTIPATH attribute and the routing was broken in this case. Will the libnl implementation use the 1st nexthop regardless of the type?

@galnoam
Copy link
Collaborator

galnoam commented Feb 27, 2025

bot:retest

1 similar comment
@tomerdbz
Copy link
Collaborator Author

bot:retest

Copy link
Member

@pasis pasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review is not finished yet

}

delete[] buf;
__log_dbg("Done");
parse_tbl(cache_state);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cache needs to be destroyed at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#include <unistd.h> // getpid()

#include <netlink/route/route.h>
#include <netlink/route/rule.h>
#include <netlink/route/link.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like link.h is unused

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

@@ -152,113 +152,117 @@ void route_table_mgr::update_tbl(nl_data_t data_type)
netlink_socket_mgr::update_tbl(data_type);
}

void route_table_mgr::parse_entry(struct nlmsghdr *nl_header)
void route_table_mgr::parse_entry(struct nl_object *nl_obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add netlink/route/route.h and netlink/netlink.h includes.
Currently the route.h is included implicitly from netlink_event.h which can be rewritten in the future to generalize the netlink related code. Besides, we can do forward declarations for few netlink structures and avoid netlink in the XLIO headers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

// p_val : custom object that contain parsed rule data.
// return true if its not related to local or default table, false otherwise.
void rule_table_mgr::parse_entry(struct nlmsghdr *nl_header)
void rule_table_mgr::parse_entry(struct nl_object *nl_obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add netlink/route/rule.h and netlink/netlink.h includes. See rationale at route_table_mgr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@@ -184,6 +184,8 @@ TEST_F(tcp_send, null_iov_elements)
vec[1].iov_len = 1000U;
rcs = sendmsg(fd, &msg, 0);
EXPECT_LE_ERRNO(rcs, 0);
EXPECT_EQ(0, rcs);
EXPECT_EQ(14, errno); // TODO - remove after check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be removed as TODO says?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup - removed it

break;
// FRA_PRIORITY: Rule Priority
uint32_t priority = rtnl_rule_get_prio(rule);
if (priority) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 0 a valid priority?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - fixed it

if (table_id) {
val.set_table_id(table_id);
}

#if DEFINED_FRA_OIFNAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this ifdef and remove its definition in configure.ac. All libnl-3 versions provide rtnl_rule_get_oif().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@tomerdbz tomerdbz force-pushed the routing_to_libnl branch 3 times, most recently from 76995b1 to b4415b6 Compare April 1, 2025 06:27
@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Apr 2, 2025

bot:retest

1 similar comment
@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Apr 2, 2025

bot:retest

}

// Gateway Address (Next Hop)
struct rtnl_nexthop *nh = rtnl_route_nexthop_n(route, 0); // Assuming the first nexthop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since libnl treats non-multipath nexthop as multipath internally, can we assume that that the multipath loop below handles all the cases and current code block is redundant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relying on this internal behavior may not be robust in the long term.

The internal implementation details of libnl could change in future updates, which might break our code if we assume this behavior.

To ensure the robustness and maintainability of our code, it is safer to explicitly handle non-multipath nexthop cases separately. This approach will protect us against any potential changes in libnl's internal handling of nexthops.

#include "core/proto/netlink_socket_mgr.h"
#include "route_rule_table_key.h"
#include "rule_entry.h"
#include "rule_val.h"
#include <netlink/route/rule.h>
#include <netlink/route/link.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like unused header

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz
Copy link
Collaborator Author

/review

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Exception Handling Issue

The new update_tbl function uses throw_xlio_exception for error handling, but it does not log specific details about the error (e.g., err values). Consider adding more detailed logging to aid debugging.

// Update data in a table
void netlink_socket_mgr::update_tbl(nl_data_t data_type)
{
    nl_sock *sockfd = nullptr;

    BULLSEYE_EXCLUDE_BLOCK_START

    sockfd = nl_socket_alloc();
    if (sockfd == nullptr) {
        __log_err("NL socket Creation: ");
        throw_xlio_exception("Failed nl_socket_alloc");
    }

    if (nl_connect(sockfd, NETLINK_ROUTE) < 0) {
        __log_err("NL socket Connection: ");
        nl_socket_free(sockfd);
        throw_xlio_exception("Failed nl_connect");
    }

    struct nl_cache *cache_state = {0};
    int err = 0;

    // cache allocation fetches the latest existing rules/routes
    if (data_type == RULE_DATA_TYPE) {
        err = rtnl_rule_alloc_cache(sockfd, AF_INET, &cache_state);
    } else if (data_type == ROUTE_DATA_TYPE) {
        err = rtnl_route_alloc_cache(sockfd, AF_INET, 0, &cache_state);
    }

    if (err < 0) {
        nl_socket_free(sockfd);
        throw_xlio_exception("Failed to allocate route cache");
    }

    parse_tbl(cache_state);

    nl_cache_free(cache_state);
    nl_socket_free(sockfd);
}
Multi-path Handling Logic

The multi-path handling logic in parse_attr uses a lambda to determine the best next hop based on weight. Ensure this logic is robust and handles edge cases, such as missing or invalid weights.

    struct nexthop_iterator_context {
        struct rtnl_nexthop *best_next_hop;
        uint8_t best_next_hop_weight;

    } best_next_hop_context = {.best_next_hop = nullptr, .best_next_hop_weight = 0xff};

    // Multi-path
    rtnl_route_foreach_nexthop(
        route,
        [](struct rtnl_nexthop *next_hop, void *context) {
            nexthop_iterator_context *best_next_hop = (nexthop_iterator_context *)context;
            const uint8_t current_nh_weight = rtnl_route_nh_get_weight(next_hop);

            // min valid weight is 1
            if (current_nh_weight > 0 && current_nh_weight < best_next_hop->best_next_hop_weight) {
                best_next_hop->best_next_hop_weight = current_nh_weight;
                best_next_hop->best_next_hop = next_hop;
            }
        },
        &best_next_hop_context);

    if (best_next_hop_context.best_next_hop != nullptr) {
        struct rtnl_nexthop *best_next_hop = best_next_hop_context.best_next_hop;
        const auto nh_gateway = rtnl_route_nh_get_gateway(best_next_hop);

        const ip_address &gw_addr =
            ip_address(nl_addr_get_binary_addr(nh_gateway), nl_addr_get_family(nh_gateway));

        val.set_if_index(rtnl_route_nh_get_ifindex(best_next_hop));
        char nh_if_name[IFNAMSIZ] = {0};
        if_indextoname(val.get_if_index(), nh_if_name);
        val.set_if_name(nh_if_name);

        val.set_gw(gw_addr);

        // Set destination mask and prefix length
        struct nl_addr *dst = rtnl_route_nh_get_newdst(best_next_hop);
        if (dst) {
            val.set_dst_pref_len(nl_addr_get_prefixlen(dst));
        }
    }
}
Error Handling in Rule Parsing

The parse_entry function throws an exception if rtnl_rule_get_protocol fails. Consider whether this is the best approach or if fallback logic should be implemented for resilience.

// Parse received rule entry into custom object (rule_val).
void rule_table_mgr::parse_entry(struct nl_object *nl_obj)
{
    int err = 0;
    rule_val val;

    // Cast the generic nl_object to a specific route or rule object
    struct rtnl_rule *rule = reinterpret_cast<struct rtnl_rule *>(nl_obj);

    // Set rule properties in p_val using libnl getters
    uint8_t protocol = 0;
    err = rtnl_rule_get_protocol(rule, &protocol);
    if (err < 0) {
        throw_xlio_exception("Failed to get rule protocol");
    }

    val.set_family(rtnl_rule_get_family(rule));
    val.set_protocol(protocol);
    val.set_tos(rtnl_rule_get_dsfield(rule));
    val.set_table_id(rtnl_rule_get_table(rule));

    parse_attr(rule, val);

    val.set_state(true);

    rule_table_t &table = val.get_family() == AF_INET ? m_table_in4 : m_table_in6;
    table.push_back(val);
}

@galnoam
Copy link
Collaborator

galnoam commented May 5, 2025

bot:retest

3 similar comments
@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz
Copy link
Collaborator Author

bot:retest

3 similar comments
@dpressle
Copy link
Collaborator

bot:retest

@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Jun 9, 2025

bot:retest

@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz tomerdbz force-pushed the routing_to_libnl branch from f414448 to 8b4ad1a Compare July 7, 2025 09:09
Kernel 6.10 netlink has breaked XLIO functionality.
Transitioned to libnl - an abstraction that wraps netlink.

This both solves the issue and makes us more robust.

Signed-off-by: Tomer Cabouly <[email protected]>
@tomerdbz tomerdbz force-pushed the routing_to_libnl branch from 8b4ad1a to 3e71874 Compare July 7, 2025 13:07
@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Jul 8, 2025

bot:retest

1 similar comment
@tomerdbz
Copy link
Collaborator Author

bot:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants