cf. reboot.md
CKE processes API servers as:
- API servers are rebooted one by one.
- In order to minimize API server degrade.
- API servers are rebooted with lower priority than worker nodes.
- When an API server is rebooted, the API server is "migrated" to another node. If API servers are rebooted first, migrated API servers will be rebooted one by one and nodes will never be rebooted in parallel.
- API servers are not rebooted simultaneously with worker nodes.
- It might be better not to reboot worker nodes during API server degrade.
CKE does not handle them. There is no status indicates those failures -- it is not worth adding. Reboot queue entries are leaved rebooting
state.
If many nodes encounter reboot command failure or timeout, overall reboot operation will slow down and get stuck eventually. Administrators should detect them by checking metrics.
If we check a node can be drained without eviction failure beforehand, draining nodes will proceed (more) smoothly. It may be a future work.
In most cases, Pods are distributed between racks; i.e. multiple Pods covered by single PDB are not running in single rack. If CKE choose nodes in the same rack in which the nodes already processed, draining will progress smoothly. However, this control requires much implementation but achieve few advantages.
In current implementation, CKE basically processes reboot queue entires from front to back. Administrators should simply add nodes to reboot queue grouped by rack. By doing so, nodes are processed almost per-rack basis.
There are a race condition between them. In this case, reboot queue entries might not be cancelled actually although administrators run ckecli rq cancel
. This issue has not been addressed because there is little harm.
Administrators should check the entry is successfully cancelled by using ckecli rq list
.
The following decisions have been buried in history and the reason is not known today.
- Why maximum-unreachable-nodes-for-reboot is in constraints, not in cluster.yml