Skip to content

[BUG] Race when deleting a pod and subnet with multiple IPs the same subnet bound to the same pod #4898

Closed as not planned
@zsxsoft

Description

@zsxsoft

Kube-OVN Version

v1.12.28

Kubernetes Version

/

Operation-system/Kernel Version

/

Description

Race A

When a pod is in the process of being created, but before its IP allocation is complete—if you delete both the pod and the subnet now, the following issues occur:

  • Both the Pod and Subnet are deleted.
  • The IP address remains and cannot be deleted using kubectl delete ip.
  • The only way to delete the lingering IP is by running: kubectl patch ips --type=merge -p '{"metadata":{"finalizers":[]}}'
[root@vm-master-1 ~]# kubectl delete ip vm-1-0.vm
ip.kubeovn.io "vm-1-0.vm" deleted
^C[root@vm-master-1 ~]# kubectl get ip | grep vm-1 | head -n9
vm-1-0.vm                           10.XXXX           XXXX   node-XX       net-c13XXXX
vm-1-0.vm.nad-10.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-11.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-12.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-13.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-14.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-15.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-16.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
vm-1-0.vm.nad-17.vm.ovn             10.XXXX           XXXX                 net-c13XXXX
[root@vm-master-1 ~]# 
[root@vm-master-1 ~]# kubectl delete ip vm-1-0.vm
ip.kubeovn.io "vm-1-0.vm" deleted
^C[root@vm-master-1 ~]# kubectl get ip | grep vm-1 | head -n1
vm-1-0.vm                           10.XXXX           XXXX   node-XX       net-c13XXXX
[root@vm-master-1 ~]# kubectl get pods -n vm | grep vm-1
[root@vm-master-1 ~]# kubectl get subnet | grep net-c13XXX

Race B

I'm unsure about the triggering conditions, but the final test results are:

  • The Pod was deleted, but both the IP and Subnet remained undeleted.
  • The IP can only be deleted by patching the finalizer; it cannot be deleted directly.
  • Even after deleting all IPs (as shown below, with V4USED equal to 0), the Subnet cannot be deleted and can only be removed by patching the finalizer.

This is quite common in my production cluster, with about a 10% probability of recurrence, but I'm still unsure of the underlying cause.

[root@vm-master-1 ~]# kubectl get subnet net-net-edaXXXXX
NAME                                   PROVIDER   VPC           PROTOCOL   CIDR            PRIVATE   NAT    DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS       U2OINTERCONNECTIONIP
net-net-edaXXXXX   ovn        ovn-cluster   IPv4       10.XX.XX.0/24   false     true   false     distributed   0        253           0        0             ["10.XX.XX.1"]   
[root@vm-master-1 ~]# kubectl delete subnet net-net-edaXXXXX
subnet.kubeovn.io "net-net-edaXXXXX" deleted
^C[root@vm-master-1 ~]# kubectl edit subnet net-net-edaXXXXX
subnet.kubeovn.io/net-net-edaXXXXX edited
[root@vm-master-1 ~]# kubectl get subnet net-net-edaXXXXX

Steps To Reproduce

Subnet YAML: #4822 (comment)
Pod YAML:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: vm/nad-3,vm/nad-4,vm/nad-5,vm/nad-6,vm/nad-7,vm/nad-8,vm/nad-9,vm/nad-10,vm/nad-11,vm/nad-12,vm/nad-13,vm/nad-14,vm/nad-15
    nad-3.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-3.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-3.vm.ovn.kubernetes.io/ip_address: 10.123.123.194
    nad-3.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-3.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-3.vm.ovn.kubernetes.io/mac_address: e8:4f:e0:e3:4e:b3
    nad-3.vm.ovn.kubernetes.io/port_security: "true"
    nad-4.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-4.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-4.vm.ovn.kubernetes.io/ip_address: 10.123.123.3
    nad-4.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-4.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-4.vm.ovn.kubernetes.io/mac_address: 10:71:ec:53:e2:89
    nad-4.vm.ovn.kubernetes.io/port_security: "true"
    nad-5.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-5.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-5.vm.ovn.kubernetes.io/ip_address: 10.123.123.245
    nad-5.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-5.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-5.vm.ovn.kubernetes.io/mac_address: 24:c3:df:f5:2f:fe
    nad-5.vm.ovn.kubernetes.io/port_security: "true"
    nad-6.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-6.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-6.vm.ovn.kubernetes.io/ip_address: 10.123.123.152
    nad-6.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-6.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-6.vm.ovn.kubernetes.io/mac_address: 48:79:2b:7a:e0:91
    nad-6.vm.ovn.kubernetes.io/port_security: "true"
    nad-7.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-7.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-7.vm.ovn.kubernetes.io/ip_address: 10.123.123.21
    nad-7.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-7.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-7.vm.ovn.kubernetes.io/mac_address: 00:6f:72:df:67:7b
    nad-7.vm.ovn.kubernetes.io/port_security: "true"
    nad-8.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-8.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-8.vm.ovn.kubernetes.io/ip_address: 10.123.123.191
    nad-8.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-8.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-8.vm.ovn.kubernetes.io/mac_address: 00:13:bb:f1:04:6b
    nad-8.vm.ovn.kubernetes.io/port_security: "true"
    nad-9.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-9.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-9.vm.ovn.kubernetes.io/ip_address: 10.123.123.121
    nad-9.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-9.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-9.vm.ovn.kubernetes.io/mac_address: 14:ac:d3:09:6b:10
    nad-9.vm.ovn.kubernetes.io/port_security: "true"
    nad-10.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-10.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-10.vm.ovn.kubernetes.io/ip_address: 10.123.123.25
    nad-10.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-10.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-10.vm.ovn.kubernetes.io/mac_address: 40:ec:54:03:f1:27
    nad-10.vm.ovn.kubernetes.io/port_security: "true"
    nad-11.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-11.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-11.vm.ovn.kubernetes.io/ip_address: 10.123.123.9
    nad-11.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-11.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-11.vm.ovn.kubernetes.io/mac_address: 64:c2:cd:a3:b2:93
    nad-11.vm.ovn.kubernetes.io/port_security: "true"
    nad-12.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-12.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-12.vm.ovn.kubernetes.io/ip_address: 10.123.123.160
    nad-12.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-12.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-12.vm.ovn.kubernetes.io/mac_address: 78:4f:a3:a7:e1:69
    nad-12.vm.ovn.kubernetes.io/port_security: "true"
    nad-13.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-13.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-13.vm.ovn.kubernetes.io/ip_address: 10.123.123.197
    nad-13.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-13.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-13.vm.ovn.kubernetes.io/mac_address: 40:ce:03:28:e7:08
    nad-13.vm.ovn.kubernetes.io/port_security: "true"
    nad-14.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-14.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-14.vm.ovn.kubernetes.io/ip_address: 10.123.123.222
    nad-14.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-14.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-14.vm.ovn.kubernetes.io/mac_address: 10:3d:bf:ef:2c:85
    nad-14.vm.ovn.kubernetes.io/port_security: "true"
    nad-15.vm.ovn.kubernetes.io/cidr: 10.123.123.0/24
    nad-15.vm.ovn.kubernetes.io/gateway: 10.123.123.1
    nad-15.vm.ovn.kubernetes.io/ip_address: 10.123.123.157
    nad-15.vm.ovn.kubernetes.io/logical_router: ovn-cluster
    nad-15.vm.ovn.kubernetes.io/logical_switch: net-netbbb
    nad-15.vm.ovn.kubernetes.io/mac_address: 28:89:fd:85:ca:17
    nad-15.vm.ovn.kubernetes.io/port_security: "true"
    ovn.kubernetes.io/cidr: 10.10.10.0/24
    ovn.kubernetes.io/gateway: 10.10.10.1
    ovn.kubernetes.io/ip_address: 10.10.10.52
    ovn.kubernetes.io/logical_router: ovn-cluster
    ovn.kubernetes.io/logical_switch: net-netaaa
    ovn.kubernetes.io/port_security: "true"
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"
  name: vm-aaa-0
  namespace: vm
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: nginx
  dnsPolicy: ClusterFirst

Current Behavior

/

Expected Behavior

/

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions