Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: manage PodDisruptionBudget for SparkApplication driver and executor #2326

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

a7i
Copy link

@a7i a7i commented Nov 14, 2024

Purpose of this PR

Provide the ability to create PodDisruptionBudget per Spark Application

Proposed changes:

  • Expose PodDisruptionBudgetSpec for driver definition
    • Expose PodDisruptionBudgetSpec for executor definition

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Our spark pipelines cannot be interrupted and during node drain, we want to prevent eviction of executor and driver pods. Once the pipeline is complete, then the node can be drained. This is natively supported via PodDisruptionBudget with maxUnavailable: 0

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yuchaoran2011 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@a7i a7i force-pushed the poddisruptionbudget branch 4 times, most recently from 60b574d to f4bb934 Compare November 14, 2024 21:29
@missedone
Copy link
Contributor

@a7i , I'm wondering if we can use the new feature of the pod template to specify the PDB per PR #2141

@a7i
Copy link
Author

a7i commented Nov 14, 2024

@a7i , I'm wondering if we can use the new feature of the pod template to specify the PDB per PR #2141

@missedone looks like a useful PR! How would pod template control PDB definition? Are you suggesting to implement a single PDB that prevents a common pod label from being evicted?

@missedone
Copy link
Contributor

Ah right, it’s PDB which need a specific configuration item for it. My brain was stuck :(

@jacobsalway
Copy link
Member

Happy to take a look at the PR as well because this may be a useful feature, but if this is specifically for node draining, you could add annotations to prevent eviction: karpenter.sh/do-not-disrupt: "true" for Karpenter and "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" for cluster-autoscaler.

@a7i
Copy link
Author

a7i commented Nov 15, 2024

"cluster-autoscaler.kubernetes.io/safe-to-evict"

Thanks! The context here is nodegroup upgrades, so we drain the nodes from the old nodegroup. so karpenter or cluster-autoscaler don't come into play.

@Cian911
Copy link
Contributor

Cian911 commented Nov 15, 2024

This would be a great addition - thanks @a7i

@a7i a7i force-pushed the poddisruptionbudget branch from f4bb934 to 62e73f3 Compare November 15, 2024 18:57
@a7i a7i force-pushed the poddisruptionbudget branch from 62e73f3 to 9f054c3 Compare November 18, 2024 02:17
@jacobsalway
Copy link
Member

"cluster-autoscaler.kubernetes.io/safe-to-evict"

Thanks! The context here is nodegroup upgrades, so we drain the nodes from the old nodegroup. so karpenter or cluster-autoscaler don't come into play.

Ah, so this is a user initiated drain and not one done automatically by either node provisioner e.g. Karpenter drift detection or node consolidation. In that case definitely understand your issue with needing to provision PDBs.

Will review this tonight.

@a7i
Copy link
Author

a7i commented Dec 5, 2024

Will review this tonight.

@jacobsalway let me know if there's interest and I can rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants