-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow setting label to nodes about to be upgraded/restarted #3204
Comments
Hi, thanks for filing this! This issue relates to a topic of reboot handling that's ongoing, for which most information/discussion is (AFAIK) sadly trapped in internal-to-RH proprietary systems because staying open requires relentless commitment and we aren't consistent about that.
I think we should avoid having OpenShift/MCO-specific labels here; we want to interoperate with the rest of the Kubernetes ecosystem.
Note that https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown will make this more reliable and we (OCP) plan to roll that out. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Still relevant. And reading https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown another time, I don't see how that will help the use case described above. How can @cgwalters: Do I misunderstand the mechanism? /remove-lifecycle rotten |
/reopen |
@ibotty: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description
Because there is no agreed-upon way to signal operators that a node is drained, there are multiple ways that operators handle it.
Rook detects node drain by observing pods on the node. This works fine but feels a bit fragile.
The problem is that some operators (e.g. the Zalando PostgreSQL Operator) "detect" drains by watching node's labels. Whenever a label is not set anymore (e.g. "node-ready=true") it will (try to) failover to another DB pod on another node.
This is a feature request to update node's labels when a reboot is about to happen.
Steps to reproduce the issue:
meanwhile
4. some operator not knowing that the machine is about to be rebooted and not updating the pdb (directly or indirectly.)
Describe the results you expected:
machineconfiguration.openshift.io/pending-restart=false
to=true
,3a. an operator removes active workload from the node, removing/updating pdbs that affect the node,
3b. machine-config-daemon drains the node,
machineconfiguration.openshift.io/pending-restart=false
.The text was updated successfully, but these errors were encountered: