fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp #2316

cosimomeli · 2025-06-17T14:26:25Z

Description
When a node receives the unreachable taint, the Kubernetes taint controller triggers the deletion of all pods after 5 minutes. When the Node Repair threshold is reached, Karpenter's drain procedure waits for all pods to be evicted or to be stuck on termination (when they have passed their deletionTimestamp), but if a Pod has a long termination grace period (RabbitMQ operator pods have 7 days, for example) the node will wait too long before being deleted.

To improve the forced termination, I added the terminating pods with a deletionTimestamp after the nodeTerminationTimestamp to be deleted again, so their deletionTimestamp can be aligned with the nodeTerminationTimestamp.

How was this change tested?
I added a unit test for this and also tested the change with both an Unhealthy Node on AWS (dead kubelet) and a simple node deletion.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

linux-foundation-easycla · 2025-06-17T14:26:29Z

The committers listed above are authorized under a signed CLA.

✅ login: cosimomeli (c69ab6d, 425b1c6, f776c6b, 7a58d17, 1d2a836)

k8s-ci-robot · 2025-06-17T14:26:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cosimomeli
Once this PR has been reviewed and has the lgtm label, please assign ellistarn for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-06-17T14:26:34Z

Welcome @cosimomeli!

It looks like this is your first PR to kubernetes-sigs/karpenter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/karpenter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-06-17T14:26:35Z

Hi @cosimomeli. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coveralls · 2025-06-17T15:15:20Z

Pull Request Test Coverage Report for Build 15710504117

Details

6 of 6 (100.0%) changed or added relevant lines in 2 files are covered.
6 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.04%) to 82.001%

Files with Coverage Reduction	New Missed Lines	%
pkg/test/expectations/expectations.go	2	93.14%
pkg/controllers/disruption/consolidation.go	4	85.55%

Totals
Change from base Build 15692799185:	-0.04%
Covered Lines:	10260
Relevant Lines:	12512

💛 - Coveralls

jonathan-innis · 2025-06-17T20:35:03Z

/assign @engedaam

Amanuel implemented Node Autorepair so assigning him since he's the relevant owner

cosimomeli added 3 commits June 17, 2025 10:14

Delete terminating pods to allow force deletion when needed

425b1c6

Change pod deletion logic

c69ab6d

Add test

1d2a836

k8s-ci-robot requested review from engedaam and tallaxes June 17, 2025 14:26

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jun 17, 2025

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 17, 2025

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 17, 2025

cosimomeli and others added 2 commits June 17, 2025 16:36

Merge branch 'main' into fix-force-delete

7a58d17

lint

f776c6b

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 17, 2025

k8s-ci-robot assigned engedaam Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp #2316

fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp #2316

cosimomeli commented Jun 17, 2025

Uh oh!

linux-foundation-easycla bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

coveralls commented Jun 17, 2025

Uh oh!

jonathan-innis commented Jun 17, 2025

Uh oh!

Uh oh!

fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp #2316

Are you sure you want to change the base?

fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp #2316

Conversation

cosimomeli commented Jun 17, 2025

Uh oh!

linux-foundation-easycla bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

coveralls commented Jun 17, 2025

Pull Request Test Coverage Report for Build 15710504117

Details

💛 - Coveralls

Uh oh!

jonathan-innis commented Jun 17, 2025

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jun 17, 2025 •

edited

Loading