-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent driver pod from being deleted before its status is processed by the operator (#2054) #2076
base: master
Are you sure you want to change the base?
Conversation
oldFinalizer := pod.Finalizers | ||
var newFinalizer []string | ||
for _, finalizer := range oldFinalizer { | ||
if finalizer != webhook.DriverFinalize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Do we need a check to verify that the web hook is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @peter-mcclonski , glad to your review, if webhook is disabled, driver pod will not contains the finalizer, so it does not need a check
pkg/controller/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, this file may be created by vscode, I will remove it
if err := c.removeDriverPodFinalizer(app); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering when you try to delete a running SparkApplication, whether spark operator will need to wait the app to be completed or failed, and then it can remove the finalizer. If this is true, for a long running Spark job, like Spark streaming job, maybe it will never turned into completed or failed state and the finalizer will never be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, finalizer is used for controling pod deletion, if the streaming driver pod is running, the owner spark job will never turn into completed or failed state, but if driver pod is deleted manually, or become status out of pod pending and pod running status, the streaming sparkapp will finally turn into completed or failed, at that time, driver finalizer will be deleted
31ce452
to
06efe5b
Compare
Hi team, I have fix some previous review problem and this code is running fine in local cluster, please help code review again, thanks! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: peter-mcclonski The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…by the operator (kubeflow#2054) Signed-off-by: imtzer <[email protected]>
🛑 Important:
Please open an issue to discuss significant work before you start. We appreciate your contributions and don't want your efforts to go to waste!
For guidelines on how to contribute, please review the CONTRIBUTING.md document.
Purpose of this PR
The backgroud of this pr is discribed in #2054, driver pod is deleted before it should be use in operator, now the sparkapp state transformation is trigger by list-watch.
For example, if a new sparkapp submit successfully and driver pod has deleted after compeleted status but before operator start handling sparkapp state transformation to SubmittedState, the code is in here getAndUpdateAppState, the function getAndUpdateAppState need to get driver pod but it was deleted, and sparkapp become failed and error message show 'driver pod not found'.
So to add finalier to driver pod to prevent it
Proposed changes:
Change Category
Indicate the type of change by marking the applicable boxes:
Rationale
Checklist
Before submitting your PR, please review the following:
Additional Notes