Skip to content

externalIPs DNAT rules are not installed when clusterIP is None #131497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ttc0419 opened this issue Apr 27, 2025 · 4 comments · May be fixed by #131503
Open

externalIPs DNAT rules are not installed when clusterIP is None #131497

ttc0419 opened this issue Apr 27, 2025 · 4 comments · May be fixed by #131503
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@ttc0419
Copy link

ttc0419 commented Apr 27, 2025

What happened?

Consider the following service:

apiVersion: v1
kind: Service
metadata:
  name: kube-bench-dev
spec:
  clusterIP: None
  selector:
    instance: kube-bench-dev
  ports:
  - name: tcp-80
    port: 80
    protocol: TCP
    targetPort: 80
  externalIPs:
  - 192.168.64.253
% kubectl get service kube-bench-dev 
NAME             TYPE        CLUSTER-IP   EXTERNAL-IP      PORT(S)   AGE
kube-bench-dev   ClusterIP   None         192.168.64.253   80/TCP    6s

But no DNAT rules for the external IP:

table ip kube-proxy {
	comment "rules for kube-proxy"
	set cluster-ips {
		type ipv4_addr
		comment "Active ClusterIPs"
		elements = { 172.16.0.1, 172.16.0.173,
			     172.16.0.220 }
	}

	set nodeport-ips {
		type ipv4_addr
		comment "IPs that accept NodePort traffic"
		elements = { 192.168.64.2 }
	}

	map no-endpoint-services {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "vmap to drop or reject packets to services with no endpoints"
	}

	map no-endpoint-nodeports {
		type inet_proto . inet_service : verdict
		comment "vmap to drop or reject packets to service nodeports with no endpoints"
	}

	map firewall-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "destinations that are subject to LoadBalancerSourceRanges"
	}

	map service-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "ClusterIP, ExternalIP and LoadBalancer IP traffic"
		elements = { 172.16.0.173 . tcp . 80 : goto service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http,
			     192.168.64.254 . tcp . 80 : goto external-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http,
			     172.16.0.1 . tcp . 443 : goto service-2QRHZV4L-default/kubernetes/tcp/https,
			     172.16.0.173 . tcp . 443 : goto service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https,
			     172.16.0.220 . tcp . 443 : goto service-FMTKUH45-kube-system/ingress-nginx-controller-admission/tcp/https-webhook,
			     192.168.64.254 . tcp . 443 : goto external-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https }
	}

	map service-nodeports {
		type inet_proto . inet_service : verdict
		comment "NodePort traffic"
	}

	chain filter-prerouting {
		type filter hook prerouting priority dstnat - 10; policy accept;
		ct state new jump firewall-check
	}

	chain filter-input {
		type filter hook input priority -110; policy accept;
		ct state new jump nodeport-endpoints-check
		ct state new jump service-endpoints-check
	}

	chain filter-forward {
		type filter hook forward priority -110; policy accept;
		ct state new jump service-endpoints-check
		ct state new jump cluster-ips-check
	}

	chain filter-output {
		type filter hook output priority dstnat - 10; policy accept;
		ct state new jump service-endpoints-check
		ct state new jump firewall-check
	}

	chain filter-output-post-dnat {
		type filter hook output priority dstnat + 10; policy accept;
		ct state new jump cluster-ips-check
	}

	chain nat-prerouting {
		type nat hook prerouting priority dstnat; policy accept;
		jump services
	}

	chain nat-output {
		type nat hook output priority dstnat; policy accept;
		jump services
	}

	chain nat-postrouting {
		type nat hook postrouting priority srcnat; policy accept;
		jump masquerading
	}

	chain nodeport-endpoints-check {
		ip daddr @nodeport-ips meta l4proto . th dport vmap @no-endpoint-nodeports
	}

	chain service-endpoints-check {
		ip daddr . meta l4proto . th dport vmap @no-endpoint-services
	}

	chain firewall-check {
		ip daddr . meta l4proto . th dport vmap @firewall-ips
	}

	chain services {
		ip daddr . meta l4proto . th dport vmap @service-ips
		ip daddr @nodeport-ips meta l4proto . th dport vmap @service-nodeports
	}

	chain masquerading {
		meta mark & 0x00004000 == 0x00000000 return
		meta mark set meta mark ^ 0x00004000
		masquerade fully-random
	}

	chain cluster-ips-check {
		ip daddr @cluster-ips reject comment "Reject traffic to invalid ports of ClusterIPs"
	}

	chain mark-for-masquerade {
		meta mark set meta mark | 0x00004000
	}

	chain reject-chain {
		comment "helper for @no-endpoint-services / @no-endpoint-nodeports"
		reject
	}

	chain endpoint-KUBDMD37-default/kubernetes/tcp/https__192.168.64.2/6443 {
		ip saddr 192.168.64.2 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.2:6443
	}

	chain service-2QRHZV4L-default/kubernetes/tcp/https {
		ip daddr 172.16.0.1 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-KUBDMD37-default/kubernetes/tcp/https__192.168.64.2/6443 }
	}

	chain endpoint-5UYISHKM-kube-system/ingress-nginx-controller/tcp/http__192.168.64.68/80 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:80
	}

	chain service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http {
		ip daddr 172.16.0.173 tcp dport 80 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-5UYISHKM-kube-system/ingress-nginx-controller/tcp/http__192.168.64.68/80 }
	}

	chain external-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http {
		jump mark-for-masquerade
		goto service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http
	}

	chain endpoint-VRCVTPLF-kube-system/ingress-nginx-controller/tcp/https__192.168.64.68/443 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:443
	}

	chain service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https {
		ip daddr 172.16.0.173 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-VRCVTPLF-kube-system/ingress-nginx-controller/tcp/https__192.168.64.68/443 }
	}

	chain external-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https {
		jump mark-for-masquerade
		goto service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https
	}

	chain endpoint-XIULVOT6-kube-system/ingress-nginx-controller-admission/tcp/https-webhook__192.168.64.68/8443 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:8443
	}

	chain service-FMTKUH45-kube-system/ingress-nginx-controller-admission/tcp/https-webhook {
		ip daddr 172.16.0.220 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-XIULVOT6-kube-system/ingress-nginx-controller-admission/tcp/https-webhook__192.168.64.68/8443 }
	}

	chain endpoint-R3GEKHA3-default/kube-bench-dev/tcp/tcp-80__192.168.64.66/80 {
	}

	chain service-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80 {
	}
}

What did you expect to happen?

External IP service DNAT rules should be installed, like when clusterIP is not None:

table ip kube-proxy {
	comment "rules for kube-proxy"
	set cluster-ips {
		type ipv4_addr
		comment "Active ClusterIPs"
		elements = { 172.16.0.1, 172.16.0.173,
			     172.16.0.220, 172.16.0.242 }
	}

	set nodeport-ips {
		type ipv4_addr
		comment "IPs that accept NodePort traffic"
		elements = { 192.168.64.2 }
	}

	map no-endpoint-services {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "vmap to drop or reject packets to services with no endpoints"
	}

	map no-endpoint-nodeports {
		type inet_proto . inet_service : verdict
		comment "vmap to drop or reject packets to service nodeports with no endpoints"
	}

	map firewall-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "destinations that are subject to LoadBalancerSourceRanges"
	}

	map service-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment "ClusterIP, ExternalIP and LoadBalancer IP traffic"
		elements = { 172.16.0.173 . tcp . 80 : goto service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http,
			     172.16.0.242 . tcp . 80 : goto service-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80,
			     192.168.64.253 . tcp . 80 : goto external-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80,
			     192.168.64.254 . tcp . 80 : goto external-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http,
			     172.16.0.1 . tcp . 443 : goto service-2QRHZV4L-default/kubernetes/tcp/https,
			     172.16.0.173 . tcp . 443 : goto service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https,
			     172.16.0.220 . tcp . 443 : goto service-FMTKUH45-kube-system/ingress-nginx-controller-admission/tcp/https-webhook,
			     192.168.64.254 . tcp . 443 : goto external-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https }
	}

	map service-nodeports {
		type inet_proto . inet_service : verdict
		comment "NodePort traffic"
	}

	chain filter-prerouting {
		type filter hook prerouting priority dstnat - 10; policy accept;
		ct state new jump firewall-check
	}

	chain filter-input {
		type filter hook input priority -110; policy accept;
		ct state new jump nodeport-endpoints-check
		ct state new jump service-endpoints-check
	}

	chain filter-forward {
		type filter hook forward priority -110; policy accept;
		ct state new jump service-endpoints-check
		ct state new jump cluster-ips-check
	}

	chain filter-output {
		type filter hook output priority dstnat - 10; policy accept;
		ct state new jump service-endpoints-check
		ct state new jump firewall-check
	}

	chain filter-output-post-dnat {
		type filter hook output priority dstnat + 10; policy accept;
		ct state new jump cluster-ips-check
	}

	chain nat-prerouting {
		type nat hook prerouting priority dstnat; policy accept;
		jump services
	}

	chain nat-output {
		type nat hook output priority dstnat; policy accept;
		jump services
	}

	chain nat-postrouting {
		type nat hook postrouting priority srcnat; policy accept;
		jump masquerading
	}

	chain nodeport-endpoints-check {
		ip daddr @nodeport-ips meta l4proto . th dport vmap @no-endpoint-nodeports
	}

	chain service-endpoints-check {
		ip daddr . meta l4proto . th dport vmap @no-endpoint-services
	}

	chain firewall-check {
		ip daddr . meta l4proto . th dport vmap @firewall-ips
	}

	chain services {
		ip daddr . meta l4proto . th dport vmap @service-ips
		ip daddr @nodeport-ips meta l4proto . th dport vmap @service-nodeports
	}

	chain masquerading {
		meta mark & 0x00004000 == 0x00000000 return
		meta mark set meta mark ^ 0x00004000
		masquerade fully-random
	}

	chain cluster-ips-check {
		ip daddr @cluster-ips reject comment "Reject traffic to invalid ports of ClusterIPs"
	}

	chain mark-for-masquerade {
		meta mark set meta mark | 0x00004000
	}

	chain reject-chain {
		comment "helper for @no-endpoint-services / @no-endpoint-nodeports"
		reject
	}

	chain endpoint-KUBDMD37-default/kubernetes/tcp/https__192.168.64.2/6443 {
		ip saddr 192.168.64.2 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.2:6443
	}

	chain service-2QRHZV4L-default/kubernetes/tcp/https {
		ip daddr 172.16.0.1 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-KUBDMD37-default/kubernetes/tcp/https__192.168.64.2/6443 }
	}

	chain endpoint-5UYISHKM-kube-system/ingress-nginx-controller/tcp/http__192.168.64.68/80 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:80
	}

	chain service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http {
		ip daddr 172.16.0.173 tcp dport 80 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-5UYISHKM-kube-system/ingress-nginx-controller/tcp/http__192.168.64.68/80 }
	}

	chain external-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http {
		jump mark-for-masquerade
		goto service-TPLZMVKW-kube-system/ingress-nginx-controller/tcp/http
	}

	chain endpoint-VRCVTPLF-kube-system/ingress-nginx-controller/tcp/https__192.168.64.68/443 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:443
	}

	chain service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https {
		ip daddr 172.16.0.173 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-VRCVTPLF-kube-system/ingress-nginx-controller/tcp/https__192.168.64.68/443 }
	}

	chain external-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https {
		jump mark-for-masquerade
		goto service-HNB4FGVK-kube-system/ingress-nginx-controller/tcp/https
	}

	chain endpoint-XIULVOT6-kube-system/ingress-nginx-controller-admission/tcp/https-webhook__192.168.64.68/8443 {
		ip saddr 192.168.64.68 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.68:8443
	}

	chain service-FMTKUH45-kube-system/ingress-nginx-controller-admission/tcp/https-webhook {
		ip daddr 172.16.0.220 tcp dport 443 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-XIULVOT6-kube-system/ingress-nginx-controller-admission/tcp/https-webhook__192.168.64.68/8443 }
	}

	chain endpoint-R3GEKHA3-default/kube-bench-dev/tcp/tcp-80__192.168.64.66/80 {
		ip saddr 192.168.64.66 jump mark-for-masquerade
		meta l4proto tcp dnat to 192.168.64.66:80
	}

	chain service-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80 {
		ip daddr 172.16.0.242 tcp dport 80 ip saddr != 192.168.64.64/26 jump mark-for-masquerade
		numgen random mod 1 vmap { 0 : goto endpoint-R3GEKHA3-default/kube-bench-dev/tcp/tcp-80__192.168.64.66/80 }
	}

	chain external-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80 {
		jump mark-for-masquerade
		goto service-KQA2VLMF-default/kube-bench-dev/tcp/tcp-80
	}
}

How can we reproduce it (as minimally and precisely as possible)?

Apply the yaml

Anything else we need to know?

No response

Kubernetes version

1.33.2

Cloud provider

N/A

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@ttc0419 ttc0419 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 27, 2025
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 27, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tssurya
Copy link
Contributor

tssurya commented Apr 27, 2025

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 27, 2025
ragu101 added a commit to ragu101/kubernetes that referenced this issue Apr 28, 2025
This commit fixes an issue where DNAT rules for external IPs are not installed
by kube-proxy when a service has clusterIP: None (headless service). The issue
was specific to kube-proxy's nftables mode implementation.

The fix:
1. Adds isHeadless field to BaseServicePortInfo to track headless services
2. Modifies UsesClusterEndpoints() to handle headless services consistently
3. Ensures headless services bypass cluster endpoints while still getting
   proper DNAT rules for external IPs

Added TestUsesClusterEndpoints to verify the behavior for:
- Normal services with external IPs
- Headless services with external IPs
- Headless services without external IPs

Fixes kubernetes#131497
@aojea
Copy link
Member

aojea commented Apr 28, 2025

@ttc0419 kube-proxy does noy handle headless services, it ignores headless services, so it is not possible to add ips for those. You can see that iptables kube-proxy behaves the same,

what it seems is a bug in validation for Services, does it make sense for a headless service to have external IPs?
/kind bug

/cc @danwinship @thockin

@ttc0419
Copy link
Author

ttc0419 commented Apr 28, 2025

@ttc0419 kube-proxy does noy handle headless services, it ignores headless services, so it is not possible to add ips for those. You can see that iptables kube-proxy behaves the same,

what it seems is a bug in validation for Services, does it make sense for a headless service to have external IPs? /kind bug

/cc @danwinship @thockin

Maybe I want a monolithic service to be only accessible from outside and I do not want to waste a cluster ip for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants