distributed provisioning: unset "selected-node" for nodes which have no driver running

When deploying external-provisioner alongside the CSI driver on each node, there is one problem: if the scheduler picks a node which has no driver instance, then the volume is stuck because the usual "no capacity -> reschedule" recovery is never triggered.

A custom scheduler extension and capacity tracking can minimize the risk, but cannot prevent this entirely.

Possible solutions:
- deploy the driver on all nodes, let it report "no capacity" on those were it has no resources -> works today, but creates overhead
- deploy a central provisioner together with a driver component that knows about nodes where the driver runs -> can be done today, but implies that CSI drivers must be made aware of Kubernetes
- do something similar inside external-provisioner, probably based on node labels (specify node selector for "driver not running" and handle those)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

distributed provisioning: unset "selected-node" for nodes which have no driver running #544

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

distributed provisioning: unset "selected-node" for nodes which have no driver running #544

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions