Skip to content

pv is stuck Terminating due to race condition when csi-attacher removes finalizer and csi-provisioner tries to remove another finalizer  #1217

@andyzhangx

Description

@andyzhangx

What happened:
pv is stuck Terminating due to race condition when csi-attacher removes finalizer and csi-provisioner tries to remove another finalizer

  • symptom
    pv is stuck Terminating when pvc is deleted with HonorPVReclaimPolicy feature gate enabled.

  • process
    csi-attacher would remove the finalizer(e.g. external-attacher/disk-csi-azure-com) when pv is detached, and later on csi-provisioner the would try to remove the external-provisioner.volume.kubernetes.io/finalizer finalizer when pvc is deleted and since pv object is in the cache of provisioner , the finalizer deletion always fail until maximum 6 retries exceeds, and finally leaves the pv in Terminating state forever (the underlying storage is deleted before finalizer remove fails).

csi-attacher-disk	E0510 10:18:09.499513       1 csi_handler.go:701] Failed to remove finalizer from PV "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": PersistentVolume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"kubernetes.io/pv-protection"}

csi-attacher-disk	I0510 10:18:09.510077       1 csi_handler.go:706] Removed finalizer from PV "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0"

csi-provisioner-disk	I0510 10:18:09.466810       1 controller.go:1517] delete "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": volume deleted

csi-azuredisk-controller	I0510 10:18:09.466386       1 azure_managedDiskController.go:325] azureDisk - deleted a managed disk: /subscriptions/xxx/resourceGroups/icto-1019_npi-lcm-cn-lcm-npi-cluster-01-nodes/providers/Microsoft.Compute/disks/pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0

csi-provisioner-disk	I0510 10:18:09.489676       1 controller.go:1554] delete "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": failed to remove finalizer for persistentvolume: Operation cannot be fulfilled on persistentvolumes "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": the object has been modified; please apply your changes to the latest version and try again

csi-provisioner-disk	W0510 10:18:09.489714       1 controller.go:989] Retrying syncing volume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0", failure 6

csi-provisioner-disk	E0510 10:18:09.489752       1 controller.go:1007] error syncing volume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": Operation cannot be fulfilled on persistentvolumes "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": the object has been modified; please apply your changes to the latest version and try again
  • workaround
    remove all finalizers from the pv and then delete pv manually
kubectl patch pv NAME -p '{"metadata":{"finalizers":null}}'

/kind bug
cc @jsafrane

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

  • Driver version: v4.0.0
  • Kubernetes version (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions