Skip to content

Feature request: relay traffic to unsafe routes to another peer (for running Kubernetes over Nebula) #1480

@nbdd0121

Description

@nbdd0121

Sorry for the long text; this includes a background story on why the feature is requested; if the maintainers think there is a better way to address this, I am more than happy to listen and adjust my setup.

Feature request

Have an opt-in feature, which I tentatively name "unsafe_route_peer_relay", that if enabled, allows:

  • If both peer A and B have a certificate that allows a specific CIDR as unsafe route, say 10.100.0.0/15, then A is able to send a packet with destination IP address within 10.100.0.0/15 to B regardless what is the source IP address.

This feature is intended for Nebula nodes that are routing nodes for unsafe networks and want to hand traffic to each other. It's an opt-in feature that only needs to be enabled on those routing nodes, and does not need to apply to any other nodes.

Background

I am looking into use Nebula to provide the overlay for Kubernetes and replace VXLAN or Geneve. I've had some mixed success, so I am sharing the story and a feature that would make it work smoothly:

This is what I want to achieve:

  1. Run a Kubernetes cluster, not necessarily on a same LAN, and have them communicate over Nebula.
  2. Pods be able to reach Nebula peers (so it can connects to private services not on the cluster), and vice versa.
  3. Able to expose services from the cluster to Nebula peers.

This is what I have done:

  • For each cluster node, it was given a Nebula certificate that allows it to route to pod and service CIDR range (say, 10.100.0.0/16 for pods, and 10.101.0.0/16 for services). Cluster nodes have IP allocated within 10.0.0.0/16.
  • I've selected Cilium as it's popular choice, but I think what I describe here would apply to Calico too.
  • Cilium is configured to have Nebula IP as node IPs, native routing mode is used (instead of tunneling mode).
  • use_system_route_table is enabled on Nebula and auto-direct-node-routes is enabled on Cilium.
  • Cilium is configured to expose service IPs (lbExternalClusterIP), as we do not have a way to automatically deploy load balancers with Nebula.
  • Other Nebula clients have an unsafe route that point the 10.100.0.0/15 CIDR to one of the cluster node.

This is what happened:

  • With native routing mode and auto-direct-node-routes, Cilium automatically inserts routes such as 10.100.5.0/16 via 10.0.0.5 dev nebula
    into the routing table. Nebula is able to pick it up using use_system_route_table.
  • With the simple configuration above, the basic feature of CNI "just works": a pod on a node is able to communicate a pod on a different node by the automatically inserted route over Nebula.
  • Problematic: if a non-k8s Nebula client (say, 10.0.0.200) is trying to reach a pod/service, the packet will arrive on one of k8s node. It all works fine if the pod is running on this node. However, if the pod is running on another node, or for a service, if the load balancer decides to hand the traffic to a pod on a different node, this packet needs to be relayed to a k8s peer. However, this packet has source IP of 10.0.0.200 which the peer is not authorized to send.

The inability to redirect traffic is so far, the only difference between running k8s over Nebula than running on L2 LAN.

Solutions

I have come up with 3 solutions to make the setup work:

  • Enable tunneling for Cilium. This however means that we're running an additional overlay network on top of Nebula, which is inefficient. Also, for some reason this setup does not work well with network policies in Cilium.
  • Add SNAT rules so that these traffic unsupported by Nebula has their source remapped to local Nebula IP addresses. However this loses the preservation of IP addresses and causes the target node being unable to apply network policies correctly.
  • Support this use case in Nebula directly (hence this feature request)

POC

This is the patch that I applied on the cluster and it's sufficient in making everything work as intended.

diff --git a/firewall.go b/firewall.go
index 8a409d2..6ef17fb 100644
--- a/firewall.go
+++ b/firewall.go
@@ -432,8 +432,14 @@ func (f *Firewall) Drop(fp firewall.Packet, incoming bool, h *HostInfo, caPool *
 		//TODO: this would be better if we had a least specific match lookup, could waste time here, need to benchmark since the algo is different
 		_, ok := remoteCidr.Lookup(fp.RemoteIP)
 		if !ok {
-			f.metrics(incoming).droppedRemoteIP.Inc(1)
-			return ErrInvalidRemoteIP
+			// In case where both peer and us have ability to handle the `local_addr` as unsafe network,
+			// there might be a need for the peer to relay an ingress traffic to us.
+			// For example, when we handle k8s ingress with native routing.
+			_, ok := remoteCidr.Lookup(fp.LocalIP)
+			if !ok {
+				f.metrics(incoming).droppedRemoteIP.Inc(1)
+				return ErrInvalidRemoteIP
+			}
 		}
 	} else {
 		// Simple case: Certificate has one IP and no subnets
@@ -447,8 +453,12 @@ func (f *Firewall) Drop(fp firewall.Packet, incoming bool, h *HostInfo, caPool *
 	//TODO: this would be better if we had a least specific match lookup, could waste time here, need to benchmark since the algo is different
 	_, ok := f.localIps.Lookup(fp.LocalIP)
 	if !ok {
-		f.metrics(incoming).droppedLocalIP.Inc(1)
-		return ErrInvalidLocalIP
+		// If we can handle the remote addr (and our peer can handle this too), this is an unsafe peer relay.
+		_, ok := f.localIps.Lookup(fp.RemoteIP)
+		if !ok {
+			f.metrics(incoming).droppedLocalIP.Inc(1)
+			return ErrInvalidLocalIP
+		}
 	}
 
 	table := f.OutRules

If Nebula maintainers are happy to accept this feature, I can develop a proper PR and add gate this feature under a config option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions