Skip to content

distributed provisioning: unset "selected-node" for nodes which have no driver running #544

@pohly

Description

@pohly

When deploying external-provisioner alongside the CSI driver on each node, there is one problem: if the scheduler picks a node which has no driver instance, then the volume is stuck because the usual "no capacity -> reschedule" recovery is never triggered.

A custom scheduler extension and capacity tracking can minimize the risk, but cannot prevent this entirely.

Possible solutions:

  • deploy the driver on all nodes, let it report "no capacity" on those were it has no resources -> works today, but creates overhead
  • deploy a central provisioner together with a driver component that knows about nodes where the driver runs -> can be done today, but implies that CSI drivers must be made aware of Kubernetes
  • do something similar inside external-provisioner, probably based on node labels (specify node selector for "driver not running" and handle those)

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions