-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpd: flowspec: remove sizelimit check applied to the wrong length field (issue 18557) #18558
Conversation
7ba8895
to
9132313
Compare
CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status |
1 similar comment
CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status |
CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status (again - sorry CI issues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we should limit to FLOWSPEC_NLRI_SIZELIMIT_EXTENDED and not to BGP_EXTENDED_MESSAGE_MAX_PACKET_SIZE?
IMO, this check should be relaxed depending if we support extended message support or not:
if (packet->length >= FLOWSPEC_NLRI_SIZELIMIT_EXTENDED) { <-------------
flog_err(EC_BGP_FLOWSPEC_PACKET,
"BGP flowspec nlri length maximum reached (%u)",
packet->length);
return BGP_NLRI_PARSE_ERROR_FLOWSPEC_NLRI_SIZELIMIT;
}
I don't know the rationale for this part of the code, but what i did observe, is that even with non-FRR peers (nokia sr, junos) the session was flapping when we were exceeding that length. |
Exceeding the limit because it was using 65k which should be fine, but that check hard-coding packet length FLOWSPEC_NLRI_SIZELIMIT_EXTENDED is wrong, IMO. Unless it is defined somewhere in RFCs (before extended message support was introduced). |
@pguibert6WIND any comments? |
@ton31337 did you check this by any chance? |
Thanks point out the actual RFC, that makes sense then. But why do we need to rely on extended message support capability? I'm trying to understand this. |
There might be a confusion between the length field of the MP_REACH_NLRI Path Attribute (which bgp_flowspec.c#L108 is considering) and the length field of the entries in the NLRI field of that attribute (which uses the variable length encoding described in this section of the RFC). Not sure tho, as it would not explain why the longer messages were not accepted by the nokia and sros peers either. Maybe vendor got it wrong too??? Would be surprising at least. |
Everything is fine, but I don't understand why we need to rely on extended message support? Why can't we do: if (safi == SAFI_FLOWSPEC)
subgrp->scratch = stream_new(FLOWSPEC_NLRI_SIZELIMIT_EXTENDED); |
Fair point, as FLOWSPEC_NLRI_SIZELIMIT_EXTENDED is lower than the max length allowed when extended message capability was not negotiated (4095 < 4096). |
@ton31337 following your suggestion, and after confirming on the test bed that the modified version no longer reproduces the issue (all peers remaining table, including non-FRR ones), i've modified the PR accordingly. Based on my understanding of the RFC, i think you were right pointing that the check in bgp_flowspec.c#L108 is the actual issue. My initial approach would prevent running into the issue with FRR peers, but not other types of peers, and also would reduce efficiency. Simply put, the 4095 limit defined in the RFC pertains to individual flowspec NLRI records included in the NLRI field of the MP_REACH_NLRI path attribute, whereas the check considers the total length of all NLRIs. I'm therefore suggesting removing that check entirely. |
…eld (issue 18557) Section 4.1 of RFC8955 defines how the length field of flowspec NLRIs is encoded. The method use implies a maximum length of 4095 for a single flowspec NLRI. However, in bgp_flowspec.c, we check the length attribute of the bgp_nlri structure against this maximum value, which actually is the *total* length of all NLRI included in the considered MP_REACH_NLRI path attribute. Due to this confusion, frr would reject valid announces that contain many flowspec NLRIs, when their cummulative length exceeds 4095, and close the session. The proposed change removes that check entirely. Indeed, there is no need to check the length field of each invidual NLRI because the method employed make it impossible to encode a length greater than 4095. Signed-off-by: Stephane Poignant <[email protected]>
@@ -88,7 +88,6 @@ enum bgp_show_adj_route_type { | |||
#define BGP_NLRI_PARSE_ERROR_EVPN_TYPE4_SIZE -9 | |||
#define BGP_NLRI_PARSE_ERROR_EVPN_TYPE5_SIZE -10 | |||
#define BGP_NLRI_PARSE_ERROR_FLOWSPEC_IPV6_NOT_SUPPORTED -11 | |||
#define BGP_NLRI_PARSE_ERROR_FLOWSPEC_NLRI_SIZELIMIT -12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we need to assign -12 to the next in the list or so on, or if it is better to leave codes unchanged not to break compatibility
See issue 18557 for detailed description of the problem. When announcing flowspec routes, frr currently sends NLRIs up to
max_packet_size
. However, the maximum size of flowspec NLRIs is limited to a much lower value here.Because of this, past as certain amount of flowspec routes, the peer will drop the session.
The proposed change reduces the size of the buffer for the NLRI to the maximum value between
nlri_max_length
and eitherFLOWSPEC_NLRI_SIZELIMIT_EXTENDED
(if the peer advertised support for extended messages) orFLOWSPEC_NLRI_SIZELIMIT
(if it did not).Needs further review and testing.