-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[BPF] Support for IPv4 fragmentation #10335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BPF] Support for IPv4 fragmentation #10335
Conversation
32780c0 to
ca4a2b4
Compare
bb0d8a3 to
90c24fb
Compare
ee7f6f8 to
84148cd
Compare
8d0c1f6 to
1fa7fb1
Compare
ef33e2c to
8aedc09
Compare
sridhartigera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
| int r_off = skb_l4hdr_offset(ctx); | ||
| bool more_frags = bpf_ntohs(ip_hdr(ctx)->frag_off) & 0x2000; | ||
|
|
||
| for (i = 0; i < 10; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Nice to have a comment explaining what we are doing in this block.
felix/bpf-gpl/ip_v4_fragment.h
Outdated
| k.offset += v->len; | ||
| } | ||
|
|
||
| goto out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just return false in place of goto out
Incoming IP fragments are stored in an LRU hash map. They can arrive out of order. After each fragment, we check whether we have all fragments. If any fragment is missing, we drop the skb as we cannot let it through. Once we have all fragments, we use the current skb to assemble the whole packet, we parse it again and we let it process by the rest of the programs like if the packet arrived as a single chunk. We need to defragment incoming packets because we would not be able to pass then through policies that match on more than IP. Also we would not be able to match them to connections in conntrack. In fact, the payload of the fragments would be wrongly treated as L4 headers and misinterpreted. After a packet is reassebled, fragments are deleted. If for any reason we never see all fragments, LRU will kick out the stored fragments eventually. There are some limitations: * packet cannot have more than 10 fragments - 10 is arbitrary number greater than a reasonable number of fragments in modern networks (2) plus we fragment the packet internally into 1500 chunks in case the fragments were bigger than this - unlikely, but not impossible. However, there is no limit on fragmentation in any RFC except the smallest MTu of 576 bytes. * we can store up to 10k fragments - 10k is again arbitrary. If there is a higher fragmentation rate than this, eBPF dataplane is probably not the right choice as performance would suffer and it is likely better to let generic Linux handle such cases. * defragmentation is meant to handle corner cases and is not meant to be performant.
We need to assemble the fragments towards the host either to deal with reordering - the first fragment with L4 headers may not arrive first - or with NATing as it is easy to NAT a whole packet, but difficult to NAT the first and then only do partial nating without being able to find the CT entry due to missing L4 headers. We assume that the host does not reorder packets and therefore we police the first fragment, record that it is allowed and let the subsequent fragments through. The last fragment remove the record, however, in case of any failure and missing fragments, LRU will eventually clean it up.
Forwarding would create fragmented VXLAN packet. First let it be fragmented and then route it into vxlan. Easier to handle.
no need to defrag on WEP egress if we assume that host does not reorder packets.
Even when mtu is OK,w we may still get BPF_FIB_LKUP_RET_FRAG_NEEDED even when we do not ask for mtu check. The irony is that if we asked for mtu check, we would not get BPF_FIB_LKUP_RET_FRAG_NEEDED and all would be good.
8aedc09 to
071e846
Compare
Description
Related issues/PRs
fixes #8821
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.