-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I reported this issue here too: rancher/rke2#9484. Lots of calico-node log output is in that issue.
I have the same issue on 4 independent clusters. They all have in common:
- rke2 v1.34.3+rke2r1 or v1.35.0+rke2r1
- Calico (bundled chart and images)
- nftables dataplane in Calico and kube-proxy
- BGP for LoadBalancer route advertisements
- LoadBalancers with
externalTrafficPolicy: Local
After a certain amount of time (I can't currently pin it on anything else), Calico loses the /32 (for IPv4) and /128 (for IPv6) routes for the LoadBalancers with externalTrafficPolicy: Local. Consequently the covering route takes effect and traffic is routed towards nodes which aren't running the service, so traffic is blackholed.
The problem appears to be that the static routes actually disappear from /etc/calico/confd/config/bird_aggr.cfg or /etc/calico/confd/config/bird6_aggr.cfg within the calico-node pods. The routes are then not advertised.
Restarting the calico-node pods seems to resolve the problem for a time, until it reoccurs.
Expected Behavior
The routes should persist.
Current Behavior
Calico loses the /32 or /128 routes for externalTrafficPolicy: Local LoadBalancer services.
Possible Solution
Steps to Reproduce (for bugs)
Not sure yet, currently mostly the passage of time.
1.
2.
3.
4.
Context
I need some of my LoadBalancer services to retain the source IP address, hence using externalTrafficPolicy: Local.
Your Environment
- Calico version: 3.31.2
- Calico dataplane: nftables
- Orchestrator version: rke2 v1.35.0+rke2r1, rke2 v1.34.3+rke2r1
- Operating System and version: Debian 13 and Ubuntu 24.04, kernels 6.8.0-1040-raspi, 6.12.57+deb13-amd64, 6.17.8+deb13-amd64
- Link to your project (optional): n/a