Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557) #18558

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

spoignant-proton
Copy link
Contributor

See issue 18557 for detailed description of the problem. When announcing flowspec routes, frr currently sends NLRIs up to max_packet_size. However, the maximum size of flowspec NLRIs is limited to a much lower value here.

Because of this, past as certain amount of flowspec routes, the peer will drop the session.

The proposed change reduces the size of the buffer for the NLRI to the maximum value between nlri_max_length and either FLOWSPEC_NLRI_SIZELIMIT_EXTENDED (if the peer advertised support for extended messages) or FLOWSPEC_NLRI_SIZELIMIT (if it did not).

Needs further review and testing.

@frrbot frrbot bot added the bgp label Apr 1, 2025
@spoignant-proton spoignant-proton force-pushed the master branch 4 times, most recently from 7ba8895 to 9132313 Compare April 1, 2025 20:26
@mwinter-osr
Copy link
Member

CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status

1 similar comment
@RodrigoNardi
Copy link

CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status

@mwinter-osr
Copy link
Member

CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status (again - sorry CI issues)

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we should limit to FLOWSPEC_NLRI_SIZELIMIT_EXTENDED and not to BGP_EXTENDED_MESSAGE_MAX_PACKET_SIZE?

IMO, this check should be relaxed depending if we support extended message support or not:

	if (packet->length >= FLOWSPEC_NLRI_SIZELIMIT_EXTENDED) { <-------------
		flog_err(EC_BGP_FLOWSPEC_PACKET,
			 "BGP flowspec nlri length maximum reached (%u)",
			 packet->length);
		return BGP_NLRI_PARSE_ERROR_FLOWSPEC_NLRI_SIZELIMIT;
	}

@spoignant-proton
Copy link
Contributor Author

spoignant-proton commented Apr 2, 2025

I don't know the rationale for this part of the code, but what i did observe, is that even with non-FRR peers (nokia sr, junos) the session was flapping when we were exceeding that length.

@ton31337
Copy link
Member

ton31337 commented Apr 2, 2025

Exceeding the limit because it was using 65k which should be fine, but that check hard-coding packet length FLOWSPEC_NLRI_SIZELIMIT_EXTENDED is wrong, IMO. Unless it is defined somewhere in RFCs (before extended message support was introduced).

@ton31337
Copy link
Member

ton31337 commented Apr 2, 2025

@pguibert6WIND any comments?

@spoignant-proton
Copy link
Contributor Author

@ton31337 did you check this by any chance?
https://datatracker.ietf.org/doc/html/rfc8955#section-4.1

@ton31337
Copy link
Member

ton31337 commented Apr 2, 2025

Thanks point out the actual RFC, that makes sense then. But why do we need to rely on extended message support capability? I'm trying to understand this.

@spoignant-proton
Copy link
Contributor Author

spoignant-proton commented Apr 2, 2025

There might be a confusion between the length field of the MP_REACH_NLRI Path Attribute (which bgp_flowspec.c#L108 is considering) and the length field of the entries in the NLRI field of that attribute (which uses the variable length encoding described in this section of the RFC).

Not sure tho, as it would not explain why the longer messages were not accepted by the nokia and sros peers either. Maybe vendor got it wrong too??? Would be surprising at least.

@ton31337
Copy link
Member

ton31337 commented Apr 2, 2025

Everything is fine, but I don't understand why we need to rely on extended message support? Why can't we do:

if (safi == SAFI_FLOWSPEC)
  subgrp->scratch = stream_new(FLOWSPEC_NLRI_SIZELIMIT_EXTENDED);

@spoignant-proton
Copy link
Contributor Author

Fair point, as FLOWSPEC_NLRI_SIZELIMIT_EXTENDED is lower than the max length allowed when extended message capability was not negotiated (4095 < 4096).
I'm just checking first whether we can't simply have the check in bgp_flowspec.c pertains to the correct length field in accordance with the RFC. Testing atm and i'll update the PR accordingly.

@spoignant-proton spoignant-proton changed the title bgpd: Limit size of sent NLRIs to flowspec peers (issue 18557) bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557) Apr 3, 2025
@github-actions github-actions bot added size/XS and removed size/S labels Apr 3, 2025
@spoignant-proton
Copy link
Contributor Author

@ton31337 following your suggestion, and after confirming on the test bed that the modified version no longer reproduces the issue (all peers remaining table, including non-FRR ones), i've modified the PR accordingly.

Based on my understanding of the RFC, i think you were right pointing that the check in bgp_flowspec.c#L108 is the actual issue. My initial approach would prevent running into the issue with FRR peers, but not other types of peers, and also would reduce efficiency.

Simply put, the 4095 limit defined in the RFC pertains to individual flowspec NLRI records included in the NLRI field of the MP_REACH_NLRI path attribute, whereas the check considers the total length of all NLRIs.

I'm therefore suggesting removing that check entirely.
As a matter of fact, the length field of individual NLRI records do not need to be checked against this value, because the encoding method defined in the RFC makes it impossible for the length to be greater than 4095.
And whether the peer advertised extended message capability is relevant to the total message length, not just the NLRI fields. Since FRR announces extended message capability unconditionally, it should always accept message up to 65535 bytes (per section 4 of RFC8654). And even though we wanted to check the total message length, bgp_flowspec.c isn't probably the place where that should be done.

…eld (issue 18557)

Section 4.1 of RFC8955 defines how the length field of flowspec NLRIs is encoded.
The method use implies a maximum length of 4095 for a single flowspec NLRI.
However, in bgp_flowspec.c, we check the length attribute of the bgp_nlri structure against this maximum value, which actually is the *total* length of all NLRI included in the considered MP_REACH_NLRI path attribute.
Due to this confusion, frr would reject valid announces that contain many flowspec NLRIs, when their cummulative length exceeds 4095, and close the session.
The proposed change removes that check entirely. Indeed, there is no need to check the length field of each invidual NLRI because the method employed make it impossible to encode a length greater than 4095.

Signed-off-by: Stephane Poignant <[email protected]>
@@ -88,7 +88,6 @@ enum bgp_show_adj_route_type {
#define BGP_NLRI_PARSE_ERROR_EVPN_TYPE4_SIZE -9
#define BGP_NLRI_PARSE_ERROR_EVPN_TYPE5_SIZE -10
#define BGP_NLRI_PARSE_ERROR_FLOWSPEC_IPV6_NOT_SUPPORTED -11
#define BGP_NLRI_PARSE_ERROR_FLOWSPEC_NLRI_SIZELIMIT -12
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we need to assign -12 to the next in the list or so on, or if it is better to leave codes unchanged not to break compatibility

@ton31337 ton31337 merged commit 5e092d0 into FRRouting:master Apr 7, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants