Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Controller to Detect if Pod was Evicted by Node Upgrade #2534

Open
csviri opened this issue Sep 20, 2024 · 6 comments
Open

Generic Controller to Detect if Pod was Evicted by Node Upgrade #2534

csviri opened this issue Sep 20, 2024 · 6 comments
Assignees
Milestone

Comments

@csviri
Copy link
Collaborator

csviri commented Sep 20, 2024

During a node upgrade, pods get drained from the node. For long-running applications where frequent restarts are not desirable, it would be useful to get information about the reason for pod eviction, especially if it was because of this node upgrade.

This could be solved with a generic controller that watches pods, and nodes, and in case of pod eviction checks if the node is being drained and sends a notification to a listener interface about the pod.

@csviri csviri added this to the 5.1 milestone Sep 20, 2024
@csviri csviri self-assigned this Sep 20, 2024
@metacosm
Copy link
Collaborator

Perhaps this should be a separate project, though?

@csviri
Copy link
Collaborator Author

csviri commented Sep 26, 2024

Maybe just a separate module, within this project. Since it's called SDK, at least in my mind tool/libs for common subproblems fits. What do you think?

@metacosm
Copy link
Collaborator

I get the point but adding "random" utilities to the SDK project dilutes the SDK itself, in my opinion, though we could make it an example operator that would also be actually useful…

@csviri
Copy link
Collaborator Author

csviri commented Sep 26, 2024

The thing is that the notification system might vary, based how the platform handles such events in a specific company, som might use kubernetes events others kafka messages to get these specific notifications.

@metacosm
Copy link
Collaborator

Then it makes even less sense to be part of the SDK if we cannot have a solution that works generically. Or am I missing something?

@csviri
Copy link
Collaborator Author

csviri commented Sep 26, 2024

Usually it works like this, companies have internal forks and internal builds of such open source projects (at least in my experience from multiple companies), where these extension points are used to fulfill internal requirements.
See for example resource listener In Flink Operator:
https://github.com/apache/flink-kubernetes-operator/blob/d946f3f9f3a7f12098cd82db2545de7c89e220ff/flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/listener/FlinkResourceListener.java#L36

The open source project actually does not provide any implementation (only for tests), but anyone in their internal fork can provide one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants