[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working

**Describe the bug**
If we delete a pod which used `whereabouts` to allocate an additional IP from defined pool of addresses when `whereabouts` pod wasn't working on the same node, such IP address will be stuck until we delete it manually from several custom resources

**Expected behavior**
Garbage collector should check at the start of whereabouts if all allocated IPs in manifests still attached to pods (pods still exist)

**To Reproduce**
Steps to reproduce the behavior:
1. Create the following net-attach-def, we are going to use only 1 IP address to reproduce this issue asap:
```shell
cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  namespace: default
  name: super-net
spec:
  config: |-
    {
      "cniVersion": "0.3.0",
      "name": "super-net",
      "type": "macvlan",
      "master": "eth1",
      "mode": "bridge",
      "ipam": 
          {
            "range": "10.10.3.0/24",
            "range_end": "10.10.3.30",
            "range_start": "10.10.3.30",
            "type": "whereabouts"
          }
    }
EOF
```

2. Create a pod to use this net-attach-def:
```shell
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: super-pod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: default/super-net
  labels:
    role: super-pod
spec:
  containers:
    - name: super-pod
      image: bash:5.2
      imagePullPolicy: IfNotPresent
      command:
        - bash
        - -cex
        - |-
          trap 'exit 0' SIGINT SIGTERM
          while true; do
            sleep 1
          done
  restartPolicy: Never
  terminationGracePeriodSeconds: 3
EOF
```
3. Check events in the pod, we see that the IP address has been added
```shell
│   Normal  Scheduled       5s    default-scheduler  Successfully assigned default/super-pod to master1-1
│   Normal  AddedInterface  3s    multus             Add eth0 [10.10.20.249/32] from k8s-pod-network
│   Normal  AddedInterface  3s    multus             Add net1 [10.10.3.30/24] from default/super-net
```

4. Scale down whereabouts ds to zero and wait until the pods are gone
```shell
kubectl -n kube-system patch daemonset whereabouts -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
```
5. Delete the created pod and wait until it is deleted
```yaml
kubectl delete pod -n default super-pod
```
6. Bring whereabouts pods back
```shell
kubectl -n kube-system patch daemonset whereabouts --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'
```


8. Recreate the pod again but change its name a bit like in deployment (pods always have different names)
```shell
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: super-pod2
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: default/super-net
  labels:
    role: super-pod
spec:
  containers:
    - name: super-pod
      image: bash:5.2
      imagePullPolicy: IfNotPresent
      command:
        - bash
        - -cex
        - |-
          trap 'exit 0' SIGINT SIGTERM
          while true; do
            sleep 1
          done
  restartPolicy: Never
  terminationGracePeriodSeconds: 3
EOF
```
9. Check the pod events - the pod is stuck in `ContainerCreating` state
```shell
ERRORED: error configuring pod [default/super-pod2] networking: [default/super-pod2/340a8be4-0b0c-41dd-aadf-54af1bf052e6:super-net]: error adding container to network "super-net": error at storage engine: Could not allocate IP in range: ip: 10.10.3.30 / - 10.10.3.30 / range: 10.10.3.0/24 / excludeRanges: []
```

10. Check whereabouts manifests
```yaml
# overlappingrange
kubectl get overlappingrangeipreservations.whereabouts.cni.cncf.io -n kube-system 10.10.3.30 -o yaml
apiVersion: whereabouts.cni.cncf.io/v1alpha1
kind: OverlappingRangeIPReservation
metadata:
  creationTimestamp: "2025-01-16T07:41:58Z"
  generation: 1
  name: 10.10.3.30
  namespace: kube-system
  resourceVersion: "273309"
  uid: b45c9cf3-c529-4adc-ac7e-0d31d8b35b83
spec:
  ifname: net1
  podref: default/super-pod

# ippools
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system 10.10.3.0-24 -o yaml
apiVersion: whereabouts.cni.cncf.io/v1alpha1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-15T19:16:53Z"
  generation: 14
  name: 10.10.3.0-24
  namespace: kube-system
  resourceVersion: "273308"
  uid: 626c2118-6945-44a9-ace7-96f70b4a3e49
spec:
  allocations:
    "30":
      id: a56b06006c6a3e3a1eb26db82b8cd5db008f20627576e3b1c7926776bffc9ed0
      ifname: net1
      podref: default/super-pod
  range: 10.10.3.0/24
```

As we can see, `whereabouts` still thinks that this IP address is allocated to `super-pod` but this pod is not present anymore. So this IP is stuck forever until we remove it manually like this:
```shell
kubectl delete overlappingrangeipreservations.whereabouts.cni.cncf.io -n kube-system 10.10.3.30
```
And now we should remove the entry for the IP from the ipool:
```shell
    "30":
      id: a56b06006c6a3e3a1eb26db82b8cd5db008f20627576e3b1c7926776bffc9ed0
      ifname: net1
      podref: default/super-pod
```

Once we remove these two parts, the new pod will be able to allocate actually freed IP address.

We also have a bash script to make this procedure more automated:
```shell
#!/usr/bin/env bash
set -u

if [[ $# -gt 0 ]] ; then
  export KUBECONFIG=$1
fi

for IP in $(kubectl get overlappingrangeipreservations -n kube-system | cut "-d " -f1) ; do
  if [[ "${IP}" != "NAME" ]] ; then
    POD=$(kubectl get overlappingrangeipreservations -n kube-system "${IP}" -o jsonpath='{.spec.podref}')
    RESULT=$(kubectl get pod -n "$(echo ${POD} | cut -d/ -f1)" "$(echo ${POD} | cut -d/ -f2)" 2>&1)
    if [[ $? -ne 0 ]] ; then
      if echo "${RESULT}" | grep -q 'NotFound' ; then
        echo "Pod ${POD} not found in the cluster. Deleting IP ${IP}"
        kubectl delete overlappingrangeipreservations -n kube-system "${IP}"
        if [[ $? -eq 0 ]] ; then
          echo "OverlappingRangeIPReservation ${IP} deleted"
        fi
        for IPRANGE in $(kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  | cut "-d " -f1) ; do
          if [[ "${IPRANGE}" != "NAME" ]] ; then
            KEY=$(kubectl get ippools.whereabouts.cni.cncf.io "${IPRANGE}" -n kube-system -o json | jq -crM --arg pod "${POD}" '.spec.allocations | map_values(select(.podref==$pod)) | keys[0]')
            kubectl get ippools.whereabouts.cni.cncf.io "${IPRANGE}" -n kube-system -o json | jq -crM --arg key "${KEY}" 'del(.spec.allocations[$key])' | kubectl replace ippools.whereabouts.cni.cncf.io -f -
            if [[ $? -eq 0 ]] ; then
              echo "IPPool ${IPRANGE} replaced"
            fi
          fi
        done
      fi
    fi
  fi
done
```

The question is that whereabouts doesn't check allocated IP addresses at start and it is possibility to have IP addresses that are stuck

**Environment**:
- Whereabouts version : 0.8.0
- Kubernetes version (use `kubectl version`): doesn't matter, reproduced on 1.30 and 1.31
- Network-attachment-definition: see above
- Whereabouts configuration (on the host): N/A
- OS (e.g. from /etc/os-release): Ubuntu
- Kernel (e.g. `uname -a`): `Linux master1-1 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec  9 23:59:34 UTC 2024 x86_64 x86_64 x86_64 GNU/Linu`
- Others: N/A

**Additional info / context**
As far as I can see, there is only one predicate in the code for deletion event (when a pod is actually deleted from a cluster): 
```
	podsInformer.AddEventHandler(
		cache.ResourceEventHandlerFuncs{
			DeleteFunc: func(obj interface{}) {
				onPodDelete(queue, obj)
			},
		})
```

What I can suggest is adding a finalizer to a pod to not let it go if whereabouts didn't remove the finalizer yet or just check garbage on start as well


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions