-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Sorry for the long text; this includes a background story on why the feature is requested; if the maintainers think there is a better way to address this, I am more than happy to listen and adjust my setup.
Feature request
Have an opt-in feature, which I tentatively name "unsafe_route_peer_relay", that if enabled, allows:
- If both peer A and B have a certificate that allows a specific CIDR as unsafe route, say
10.100.0.0/15
, then A is able to send a packet with destination IP address within10.100.0.0/15
to B regardless what is the source IP address.
This feature is intended for Nebula nodes that are routing nodes for unsafe networks and want to hand traffic to each other. It's an opt-in feature that only needs to be enabled on those routing nodes, and does not need to apply to any other nodes.
Background
I am looking into use Nebula to provide the overlay for Kubernetes and replace VXLAN or Geneve. I've had some mixed success, so I am sharing the story and a feature that would make it work smoothly:
This is what I want to achieve:
- Run a Kubernetes cluster, not necessarily on a same LAN, and have them communicate over Nebula.
- Pods be able to reach Nebula peers (so it can connects to private services not on the cluster), and vice versa.
- Able to expose services from the cluster to Nebula peers.
This is what I have done:
- For each cluster node, it was given a Nebula certificate that allows it to route to pod and service CIDR range (say, 10.100.0.0/16 for pods, and 10.101.0.0/16 for services). Cluster nodes have IP allocated within
10.0.0.0/16
. - I've selected Cilium as it's popular choice, but I think what I describe here would apply to Calico too.
- Cilium is configured to have Nebula IP as node IPs, native routing mode is used (instead of tunneling mode).
use_system_route_table
is enabled on Nebula andauto-direct-node-routes
is enabled on Cilium.- Cilium is configured to expose service IPs (
lbExternalClusterIP
), as we do not have a way to automatically deploy load balancers with Nebula. - Other Nebula clients have an unsafe route that point the
10.100.0.0/15
CIDR to one of the cluster node.
This is what happened:
- With native routing mode and
auto-direct-node-routes
, Cilium automatically inserts routes such as10.100.5.0/16 via 10.0.0.5 dev nebula
into the routing table. Nebula is able to pick it up usinguse_system_route_table
. - With the simple configuration above, the basic feature of CNI "just works": a pod on a node is able to communicate a pod on a different node by the automatically inserted route over Nebula.
- Problematic: if a non-k8s Nebula client (say,
10.0.0.200
) is trying to reach a pod/service, the packet will arrive on one of k8s node. It all works fine if the pod is running on this node. However, if the pod is running on another node, or for a service, if the load balancer decides to hand the traffic to a pod on a different node, this packet needs to be relayed to a k8s peer. However, this packet has source IP of10.0.0.200
which the peer is not authorized to send.
The inability to redirect traffic is so far, the only difference between running k8s over Nebula than running on L2 LAN.
Solutions
I have come up with 3 solutions to make the setup work:
- Enable tunneling for Cilium. This however means that we're running an additional overlay network on top of Nebula, which is inefficient. Also, for some reason this setup does not work well with network policies in Cilium.
- Add SNAT rules so that these traffic unsupported by Nebula has their source remapped to local Nebula IP addresses. However this loses the preservation of IP addresses and causes the target node being unable to apply network policies correctly.
- Support this use case in Nebula directly (hence this feature request)
POC
This is the patch that I applied on the cluster and it's sufficient in making everything work as intended.
diff --git a/firewall.go b/firewall.go
index 8a409d2..6ef17fb 100644
--- a/firewall.go
+++ b/firewall.go
@@ -432,8 +432,14 @@ func (f *Firewall) Drop(fp firewall.Packet, incoming bool, h *HostInfo, caPool *
//TODO: this would be better if we had a least specific match lookup, could waste time here, need to benchmark since the algo is different
_, ok := remoteCidr.Lookup(fp.RemoteIP)
if !ok {
- f.metrics(incoming).droppedRemoteIP.Inc(1)
- return ErrInvalidRemoteIP
+ // In case where both peer and us have ability to handle the `local_addr` as unsafe network,
+ // there might be a need for the peer to relay an ingress traffic to us.
+ // For example, when we handle k8s ingress with native routing.
+ _, ok := remoteCidr.Lookup(fp.LocalIP)
+ if !ok {
+ f.metrics(incoming).droppedRemoteIP.Inc(1)
+ return ErrInvalidRemoteIP
+ }
}
} else {
// Simple case: Certificate has one IP and no subnets
@@ -447,8 +453,12 @@ func (f *Firewall) Drop(fp firewall.Packet, incoming bool, h *HostInfo, caPool *
//TODO: this would be better if we had a least specific match lookup, could waste time here, need to benchmark since the algo is different
_, ok := f.localIps.Lookup(fp.LocalIP)
if !ok {
- f.metrics(incoming).droppedLocalIP.Inc(1)
- return ErrInvalidLocalIP
+ // If we can handle the remote addr (and our peer can handle this too), this is an unsafe peer relay.
+ _, ok := f.localIps.Lookup(fp.RemoteIP)
+ if !ok {
+ f.metrics(incoming).droppedLocalIP.Inc(1)
+ return ErrInvalidLocalIP
+ }
}
table := f.OutRules
If Nebula maintainers are happy to accept this feature, I can develop a proper PR and add gate this feature under a config option.