Skip to content

distributed provisioning: PVC for other node not ignored silently #669

@pohly

Description

@pohly

After a "resource exhausted" error and selecting a different node, the original provisioner instance keeps getting updates and/or continues to work on a PVC that it should ignore:

I0902 17:33:16.091273       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"example-app-jx8rr-data", UID:"f5008265-3c43-4385-a78d-4fa3f72fb611", APIVersion:"v1", ResourceVersion:"485045", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 53687091200 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
I0902 17:33:16.092186       1 connection.go:185] GRPC response: {"maximum_volume_size":{},"minimum_volume_size":{}}
I0902 17:33:16.092316       1 connection.go:186] GRPC error: <nil>
I0902 17:33:16.092419       1 capacity.go:641] Capacity Controller: no need to update csisc-5mph5 for {segment:0xc000318e10 storageClassName:csi-hostpath-fast}, same capacity 0 and correct owner
I0902 17:33:16.097560       1 controller.go:1426] provision "default/example-app-jx8rr-data" class "csi-hostpath-fast": volume rescheduled because: failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 53687091200 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
I0902 17:33:16.097697       1 controller.go:1095] Stop provisioning, removing PVC f5008265-3c43-4385-a78d-4fa3f72fb611 from claims in progress
I0902 17:33:24.229631       1 controller.go:1332] provision "default/example-app-jx8rr-data" class "csi-hostpath-fast": started
W0902 17:33:24.229729       1 controller.go:958] Retrying syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611", failure 0
E0902 17:33:24.229782       1 controller.go:981] error syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611": failed to get target node: node "aks-workerpool-15818640-vmss00000a" not found
I0902 17:33:24.229839       1 controller.go:1332] provision "default/example-app-jx8rr-data" class "csi-hostpath-fast": started
W0902 17:33:24.229877       1 controller.go:958] Retrying syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611", failure 1
E0902 17:33:24.229890       1 controller.go:981] error syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611": failed to get target node: node "aks-workerpool-15818640-vmss00000a" not found

This instance ran on vmss000000. The "failed to get target node" error is for a different node which isn't in the static node info cache.

Metadata

Metadata

Assignees

Labels

help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions