Skip to content

Pod-to-External DNS queries timing out when eBPF on #11623

@l8bloom

Description

@l8bloom

Hello,

I have issues with pods traffic targeting external endpoints, (eg. google.com) being timed out due to DNS not being resolved properly.
More specifically it seems SNAT is not in place, hence my pods never get the response back.

Expected Behavior

Pods external traffic is being SNAT-ed.

Current Behavior

Pods can't send traffic to public IP endpoints.

Steps to Reproduce (for bugs)

  1. Running a home lab cluster with eBPF enabled, calico installed with the default:
    CALICO_VERSION="3.31.3"
    https://raw.githubusercontent.com/projectcalico/calico/v$CALICO_VERSION/manifests/operator-crds.yaml
    https://raw.githubusercontent.com/projectcalico/calico/v$CALICO_VERSION/manifests/tigera-operator.yaml
    https://raw.githubusercontent.com/projectcalico/calico/v$CALICO_VERSION/manifests/custom-resources-bpf.yaml

  2. custom-resources-bpf.yaml updated with ipPools.cidr to 10.244.0.0/16 to match kubeadm init's --pod-network-cidr.

  3. Deploy test nginx, crictl exec -it to it and try curl -L google.com, fails with

root@nginx-deployment-54bb44699-48ts9:/# curl -L google.com
curl: (6) Could not resolve host: google.com

Context

Pods don't have essentially access to external/public Internet.

Tigera operator reports all resources as ready ✔️

NAME                AVAILABLE   PROGRESSING   DEGRADED   SINCE
apiserver           True        False         False      103m
calico              True        False         False      39m
goldmane            True        False         False      28m
ippools             True        False         False      107m
kubeproxy-monitor   True        False         False      107m
whisker             True        False         False      102m

CoreDNS up and running ✔️

kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS      AGE
coredns-587b887f6f-n9zsz   1/1     Running   1 (30m ago)   57m
coredns-587b887f6f-vmn9r   1/1     Running   1 (30m ago)   57m

Node to pod, pod to pod and service to pod all work ✔️

kubectl get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
kubernetes      ClusterIP   10.96.0.1        <none>        443/TCP   118m
nginx-service   ClusterIP   10.104.200.193   <none>        80/TCP    107m
curl 10.104.200.193
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
...

curl -L google.com works in global/system shell and docker based containers ✔️

CoreDNS reports errors ❌

kubectl logs -n kube-system deploy/coredns
Found 2 pods, using pod/coredns-587b887f6f-n9zsz
maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
.:53
[INFO] plugin/reload: Running configuration SHA512 = 1b226df79860026c6a52e67daa10d7f0d57ec5b023288ec00c5e05f93523c894564e15b91770d3a07ae1cfbe861d15b37d4a0027e69c546ab112970993a3b03b
CoreDNS-1.13.1
linux/amd64, go1.25.2, 1db4568
[ERROR] plugin/errors: 2 1458823090375742670.8608639235073720918. HINFO: read udp 10.244.206.152:53973->195.29.247.161:53: i/o timeout
[ERROR] plugin/errors: 2 1458823090375742670.8608639235073720918. HINFO: read udp 10.244.206.152:58302->195.29.247.162:53: i/o timeout
[ERROR] plugin/errors: 2 1458823090375742670.8608639235073720918. HINFO: read udp 10.244.206.152:45678->195.29.247.162:53: i/o timeout
[ERROR] plugin/errors: 2 1458823090375742670.8608639235073720918. HINFO: read udp 10.244.206.152:43656->195.29.247.162:53: i/o timeout
[ERROR] plugin/errors: 2 1458823090375742670.8608639235073720918. HINFO: read udp 10.244.206.152:40956->195.29.247.162:53: i/o timeout
...

ISP's DNS server 195.29.247.161.53 can't answer to 10.x.x.x, no SNAT ❌

sudo tcpdump -i any udp port 53 -n
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:25:14.439619 cali50d19c830a9 In  IP 10.244.206.152.57987 > 195.29.247.161.53: 49524+ NS? . (17)
13:25:14.471654 calib39cd32d877 In  IP 10.244.206.153.37939 > 195.29.247.162.53: 36685+ NS? . (17)
13:25:14.834065 cali50d19c830a9 In  IP 10.244.206.152.47250 > 195.29.247.162.53: 24035+ NS? . (17)
13:25:14.878193 calib39cd32d877 In  IP 10.244.206.153.35148 > 195.29.247.161.53: 53973+ NS? . (17)
13:25:15.941229 cali50d19c830a9 In  IP 10.244.206.152.44558 > 195.29.247.161.53: 40457+ NS? . (17)

As shown in the tcpdump log, outbound UDP 53 packets retain the Pod IP 10.244.206.152 when egressing the node. This indicates that the Calico eBPF data plane is failing to perform SNAT (Masquerade) despite natOutgoing: true being set in the IPPool.
I have tried to replace forward . /etc/resolv.conf in coredns with eg. 1.1.1.1 but the outcome is same.

kubectl get felixconfiguration default -o yaml
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  annotations:
    operator.tigera.io/bpfEnabled: "true"
  creationTimestamp: "2026-01-06T10:24:39Z"
  generation: 1
  name: default
  resourceVersion: "6026"
  uid: 46711f8c-f919-48b2-a02e-d90cc26c578c
spec:
  bpfConnectTimeLoadBalancing: TCP
  bpfEnabled: true
  bpfExternalServiceMode: Tunnel
  bpfHostNetworkedNATWithoutCTLB: Enabled
  bpfLogLevel: ""
  floatingIPs: Disabled
  healthPort: 9099
  logSeverityScreen: Info
  nftablesMode: Enabled
  reportingInterval: 0s
  vxlanPort: 4789
  vxlanVNI: 4096
kubectl get ippools default-ipv4-ippool -o yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  creationTimestamp: "2026-01-06T10:24:37Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tigera-operator
  name: default-ipv4-ippool
  resourceVersion: "8945"
  uid: ba9a2550-5ad1-4fbd-9f4d-a1b6969df1d3
spec:
  allowedUses:
  - Workload
  - Tunnel
  assignmentMode: Automatic
  blockSize: 26
  cidr: 10.244.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

Your Environment

  • Calico version: 3.31.3
  • Calico dataplane: bpf
  • Orchestrator version: kubernetes 1.35.0
  • Operating System and version: Linux 6.14.0-37-generic, Ubuntu 24.04.3 LTS

Happy to provide more, thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions