-
Notifications
You must be signed in to change notification settings - Fork 5.2k
WIP: Introduce Node Lifecycle WG #8396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/hold |
Looks like I'm not a member of kubernetes org anymore. I was a few years back, but didn't keep up with contributions recently. You can remove me as a lead and I can reapply after some contributions to this WG. |
75e1096
to
a19a192
Compare
We have had impactful conversations with Ryan about this group and its goals. He has experience with cluster maintenance and I look forward to his participation in the WG. |
/cc |
a19a192
to
2d6ac13
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atiratree The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d725bb9
to
a3da4df
Compare
a3da4df
to
62b033d
Compare
controllers, API validation, integration with existing core components and extension points for the | ||
ecosystem. This should be accompanied by E2E / Conformance tests. | ||
|
||
## Relevant Projects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For visibility, please let me know, if anyone has a relevant project they would like to see included here.
62b033d
to
86d036f
Compare
- Improve the Graceful/Non-Graceful Node Shutdown and consider how this affects the node lifecycle. | ||
To graduate the [Graceful Node Shutdown](https://github.com/kubernetes/enhancements/issues/2000) | ||
feature to GA and resolve the associated node shutdown issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Improve the Graceful/Non-Graceful Node Shutdown and consider how this affects the node lifecycle. | |
To graduate the [Graceful Node Shutdown](https://github.com/kubernetes/enhancements/issues/2000) | |
feature to GA and resolve the associated node shutdown issues. | |
- Improve the Graceful/Non-Graceful Node Shutdown and consider how this affects the node lifecycle. |
Let's stick to general topics, w/o mentioning specific KEPs in the charter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was requested by SIG Node. @SergeyKanzhelev can you please give us input how would you like the goals to be defined?
- As a cluster admin I want to have a simple interface to initiate a node drain/maintenance without | ||
any required manual interventions. I also want to be able to observe the node drain via the API | ||
and check on its progress. I also want to be able to discover workloads that are blocking the node | ||
drain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this entire section has 3 separate use-cases:
- initiate
- observe
- discover
Can you just split them accordingly. It's easier to read shorter user stories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I will go over the use cases and improve them.
wg-node-lifecycle/charter.md
Outdated
DRA: device taints and tolerations feature tracked in https://github.com/kubernetes/enhancements/issues/5055. | ||
- An API to remove pods from endpoints before they terminate. | ||
Currently tracked in https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8/edit?tab=t.0#heading=h.i4lwa7rdng7y. | ||
- Introduce enhancements across multiple Kubernetes SIGs to add support for the new APIs to solve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement doesn't seem to fit in Area we expect to explore:
. I'd drop it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still do not know how the integration will look like, but it will have to be explored and can result in other enhancements. I would still prefer if we could reference such future work.
Currently tracked in https://github.com/kubernetes/enhancements/issues/4563. | ||
- An API/mechanism to gracefully terminate pods during a node shutdown. | ||
Graceful node shutdown feature tracked in https://github.com/kubernetes/enhancements/issues/2000. | ||
- An API to deschedule pods that use DRA devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying there will be separate API for descheduling any Pod and a Pod with DRA device? Why both can't just use /evict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just references an existing feature without specifying the implementation details here.
Graceful node shutdown feature tracked in https://github.com/kubernetes/enhancements/issues/2000. | ||
- An API to deschedule pods that use DRA devices. | ||
DRA: device taints and tolerations feature tracked in https://github.com/kubernetes/enhancements/issues/5055. | ||
- An API to remove pods from endpoints before they terminate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, /evict
isn't sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is referencing an existing doc and I believe the /evict
API is not sufficient in this scenario since it needs to apply to all workloads.
projects and addressing scenarios that impede node drain or cause improper pod termination. Our | ||
objective is to create easily configurable, out-of-the-box solutions that seamlessly integrate with | ||
existing APIs and behaviors. We will strive to make these solutions minimalistic and extensible to | ||
support advanced use cases across the ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to especially stress this section:
We will strive to make these solutions minimalistic and extensible to support advanced
use cases across the ecosystem.
to ensure we first look into existing APIs and how we can expand them, rather than introducing new ones.
We already struggle with small usage of Eviction API, adding new API will not resolve the problem, but will only make it more complicated for users to find the right one. I believe someone else already stressed that out, but I'd like to see this being one of the key goals for this WG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, the goals should be stated in this fashion.
@atiratree, even though I don't work for Red Hat any more, I would like to join this WG, this topic is still of interest to me. |
@atiratree, I would like to be part of this WG. Pls include me as well. |
I have written some PoC that might interest this wg, sign me up. |
/cc |
@evrardjp: GitHub didn't allow me to request PR reviews from the following users: evrardjp. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc |
@kaushik229: GitHub didn't allow me to request PR reviews from the following users: kaushik229. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like how this is going and am excited to see the wg formed. thank you @atiratree !
I have been exploring the API's in this area and would like to help on this initiative. Considering that, @atiratree, I would like to be part of this WG. |
43ff1f5
to
c627543
Compare
Thank you all for your interest! Just to be on the same page for all visitors, this WG is open to everyone and we will announce the weekly meetings on the [email protected] mailing list as soon as the group is formed. If you are interested in helping us organize/lead this group, please write me on Slack to discuss. |
Co-authored-by: Ryan Hallisey <[email protected]>
c627543
to
48b4634
Compare
No description provided.