NHC healthy delay #365

mshitrit · 2025-04-15T13:55:10Z

Why we need this PR

We'd like to enable a configuration which delays node returning to health.
In case a delay is configured NHC will delete the remediation only after a node was healthy for the configured time.
A negative value means the node will not be considered healthy and a manual intervention is expected.

Motivation for this came from several customers which require more control on when taints are removed from the node and experienced use case where node regains health for a short period of time.

Changes made

Adding configuration which enables considering node unhealthy until a period of time has passed.
An annotation on CR is used to manage the delay, and a status update is added on the CR to note isn't healthy because of the delay

Which issue(s) this PR fixes

RHWA-10

Test plan

Summary by CodeRabbit

New Features
- Introduced a configurable delay before a node is considered healthy again after recovery, allowing for delayed health recognition.
- Added status tracking to indicate when a node is healthy but still within the configured delay period.
Bug Fixes
- Improved status reporting to accurately reflect delayed healthy states for nodes.
Documentation
- Updated custom resource and operator documentation to describe the new healthy delay feature and related status fields.
Tests
- Added and enhanced test scenarios to verify delayed healthy behavior and ensure correct status updates.

openshift-ci · 2025-04-15T13:55:15Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

mshitrit · 2025-04-21T07:33:37Z

/test 4.17-openshift-e2e

api/v1alpha1/nodehealthcheck_types.go

controllers/resources/manager.go

mshitrit · 2025-05-11T19:22:18Z

/test 4.17-openshift-e2e

controllers/nodehealthcheck_controller_test.go

controllers/resources/status.go

controllers/nodehealthcheck_controller.go

controllers/resources/manager.go

controllers/nodehealthcheck_controller.go

coderabbitai · 2025-05-15T07:25:58Z

Walkthrough

This update introduces a configurable "healthy delay" feature to the NodeHealthCheck (NHC) system. A new field allows specifying a delay before a node is considered healthy after recovery, with corresponding API, CRD, and status schema changes. The reconciliation logic, resource manager, and tests are updated to support and verify delayed healthy state recognition and remediation CR deletion.

Changes

File(s)	Change Summary
api/v1alpha1/nodehealthcheck_types.go, bundle/manifests/remediation.medik8s.io_nodehealthchecks.yaml, config/crd/bases/remediation.medik8s.io_nodehealthchecks.yaml	Added `healthyDelay` field to NHC spec and `healthyDelayed` to unhealthy node status; updated CRD and schema for new fields.
api/v1alpha1/zz_generated.deepcopy.go	Updated `DeepCopyInto` for `NodeHealthCheckSpec` and `UnhealthyNode` to copy new fields `HealthyDelay` and `HealthyDelayed`.
bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml, config/manifests/base/bases/node-healthcheck-operator.clusterserviceversion.yaml	Added spec and status descriptors for `healthyDelay` and `healthyDelayed` in CSV manifests.
controllers/resources/manager.go	Modified `HandleHealthyNode` to support delayed remediation CR deletion; added delay calculation logic and new annotation handling; updated `CleanUp` method.
controllers/resources/status.go	Added functions to update and check delayed healthy status in NHC status.
controllers/nodehealthcheck_controller.go	Passed `HealthyDelay` to context; updated healthy node handling to process and propagate delay for reconciliation requeue; updated cleanup call.
controllers/machinehealthcheck_controller.go	Adjusted to ignore new return value from `HandleHealthyNode` signature change.
controllers/nodehealthcheck_controller_test.go	Refactored test helper for healthy node; added tests for healthy delay behavior including indefinite delay and manual confirmation.
controllers/shared.go	Added annotation-based reconciliation trigger for manual healthy confirmation annotation changes.
docs/configuration.md	Documented the `healthyDelay` field, its usage, rationale, and manual intervention methods.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler
    participant ResourceManager
    participant Node
    participant RemediationCR

    Reconciler->>ResourceManager: HandleHealthyNode(nodeName, crName, owner)
    ResourceManager->>RemediationCR: List remediation CRs for node
    loop For each remediation CR
        ResourceManager->>RemediationCR: Check for healthy delay annotation
        alt No annotation
            ResourceManager->>RemediationCR: Set delay start annotation (now)
            ResourceManager-->>Reconciler: Return delay duration
        else Annotation present
            ResourceManager->>ResourceManager: Calculate time left
            alt Delay not expired
                ResourceManager-->>Reconciler: Return remaining delay
            else Delay expired
                ResourceManager->>RemediationCR: Delete CR
            end
        end
    end
    ResourceManager-->>Reconciler: Return shortest delay for requeue

Suggested labels

ok-to-test

Poem

Hopping through code with a healthy delay,
Now nodes take time before they can say
"I'm healthy again!"—not too soon, not too late,
Remediation CRs pause at the gate.
With tests that cheer and status anew,
This rabbit’s proud of what you do!
🐇⏳

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 226c757 and 1ec21b4.

📒 Files selected for processing (13)

api/v1alpha1/nodehealthcheck_types.go (2 hunks)
api/v1alpha1/zz_generated.deepcopy.go (2 hunks)
bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml (2 hunks)
bundle/manifests/remediation.medik8s.io_nodehealthchecks.yaml (2 hunks)
config/crd/bases/remediation.medik8s.io_nodehealthchecks.yaml (2 hunks)
config/manifests/base/bases/node-healthcheck-operator.clusterserviceversion.yaml (2 hunks)
controllers/machinehealthcheck_controller.go (1 hunks)
controllers/nodehealthcheck_controller.go (3 hunks)
controllers/nodehealthcheck_controller_test.go (5 hunks)
controllers/resources/manager.go (4 hunks)
controllers/resources/status.go (2 hunks)
controllers/shared.go (3 hunks)
docs/configuration.md (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (10)

controllers/nodehealthcheck_controller.go
controllers/machinehealthcheck_controller.go
config/manifests/base/bases/node-healthcheck-operator.clusterserviceversion.yaml
bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml
api/v1alpha1/zz_generated.deepcopy.go
bundle/manifests/remediation.medik8s.io_nodehealthchecks.yaml
api/v1alpha1/nodehealthcheck_types.go
controllers/resources/status.go
config/crd/bases/remediation.medik8s.io_nodehealthchecks.yaml
controllers/nodehealthcheck_controller_test.go

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-364
Timestamp: 2025-05-28T07:55:11.390Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, when calcCrDeletionDelay fails with an error, the intended behavior is to log the error and proceed with CR deletion (treating it as "no delay configured") rather than aborting reconciliation. This prevents the system from getting stuck when delay calculations fail due to issues like malformed annotations.

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-335
Timestamp: 2025-05-28T08:18:35.543Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, the UpdateStatusNodeDelayedHealthy call with unsafe type cast to *NodeHealthCheck is actually safe because HealthyDelayContextKey is only set for NodeHealthCheck controllers, not MachineHealthCheck controllers. This means shortestDelay will always be 0 for MachineHealthCheck, preventing the unsafe cast line from being reached.

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/status.go:74-81
Timestamp: 2025-05-28T07:42:13.767Z
Learning: In the Node Healthcheck Operator's status management for controllers/resources/status.go, when a node's healthy delay period expires, the entire unhealthy node entry is removed from the status via UpdateStatusNodeHealthy rather than just resetting the HealthyDelayed flag to false. The state transition flow is: unhealthy -> healthy with delay (HealthyDelayed=true) -> completely healthy (removed from unhealthy nodes list).

controllers/shared.go (3)

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-364
Timestamp: 2025-05-28T07:55:11.390Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, when calcCrDeletionDelay fails with an error, the intended behavior is to log the error and proceed with CR deletion (treating it as "no delay configured") rather than aborting reconciliation. This prevents the system from getting stuck when delay calculations fail due to issues like malformed annotations.

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/status.go:74-81
Timestamp: 2025-05-28T07:42:13.767Z
Learning: In the Node Healthcheck Operator's status management for controllers/resources/status.go, when a node's healthy delay period expires, the entire unhealthy node entry is removed from the status via UpdateStatusNodeHealthy rather than just resetting the HealthyDelayed flag to false. The state transition flow is: unhealthy -> healthy with delay (HealthyDelayed=true) -> completely healthy (removed from unhealthy nodes list).

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-335
Timestamp: 2025-05-28T08:18:35.543Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, the UpdateStatusNodeDelayedHealthy call with unsafe type cast to *NodeHealthCheck is actually safe because HealthyDelayContextKey is only set for NodeHealthCheck controllers, not MachineHealthCheck controllers. This means shortestDelay will always be 0 for MachineHealthCheck, preventing the unsafe cast line from being reached.

docs/configuration.md (3)

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-364
Timestamp: 2025-05-28T07:55:11.390Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, when calcCrDeletionDelay fails with an error, the intended behavior is to log the error and proceed with CR deletion (treating it as "no delay configured") rather than aborting reconciliation. This prevents the system from getting stuck when delay calculations fail due to issues like malformed annotations.

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/status.go:74-81
Timestamp: 2025-05-28T07:42:13.767Z
Learning: In the Node Healthcheck Operator's status management for controllers/resources/status.go, when a node's healthy delay period expires, the entire unhealthy node entry is removed from the status via UpdateStatusNodeHealthy rather than just resetting the HealthyDelayed flag to false. The state transition flow is: unhealthy -> healthy with delay (HealthyDelayed=true) -> completely healthy (removed from unhealthy nodes list).

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-335
Timestamp: 2025-05-28T08:18:35.543Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, the UpdateStatusNodeDelayedHealthy call with unsafe type cast to *NodeHealthCheck is actually safe because HealthyDelayContextKey is only set for NodeHealthCheck controllers, not MachineHealthCheck controllers. This means shortestDelay will always be 0 for MachineHealthCheck, preventing the unsafe cast line from being reached.

controllers/resources/manager.go (3)

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-364
Timestamp: 2025-05-28T07:55:11.390Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, when calcCrDeletionDelay fails with an error, the intended behavior is to log the error and proceed with CR deletion (treating it as "no delay configured") rather than aborting reconciliation. This prevents the system from getting stuck when delay calculations fail due to issues like malformed annotations.

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/status.go:74-81
Timestamp: 2025-05-28T07:42:13.767Z
Learning: In the Node Healthcheck Operator's status management for controllers/resources/status.go, when a node's healthy delay period expires, the entire unhealthy node entry is removed from the status via UpdateStatusNodeHealthy rather than just resetting the HealthyDelayed flag to false. The state transition flow is: unhealthy -> healthy with delay (HealthyDelayed=true) -> completely healthy (removed from unhealthy nodes list).

Learnt from: mshitrit
PR: medik8s/node-healthcheck-operator#365
File: controllers/resources/manager.go:319-335
Timestamp: 2025-05-28T08:18:35.543Z
Learning: In the node-healthcheck-operator HandleHealthyNode method, the UpdateStatusNodeDelayedHealthy call with unsafe type cast to *NodeHealthCheck is actually safe because HealthyDelayContextKey is only set for NodeHealthCheck controllers, not MachineHealthCheck controllers. This means shortestDelay will always be 0 for MachineHealthCheck, preventing the unsafe cast line from being reached.

🧬 Code Graph Analysis (1)

controllers/shared.go (1)

controllers/resources/manager.go (1)

RemediationManuallyConfirmedHealthyAnnotationKey (35-35)

🪛 LanguageTool

docs/configuration.md

[uncategorized] ~202-~202: Possible missing comma found.
Context: ... Briefly regain health for a very short period only to become unhealthy again. - Requi...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (9)

controllers/shared.go (2)

77-83: LGTM! Clean implementation following established patterns.

The new annotationsNeedReconcile function correctly follows the same pattern as labelsNeedReconcile, checking for presence changes of the manual confirmation annotation. This ensures reconciliation is properly triggered when the annotation is added or removed from nodes.

29-31: Good integration with existing reconciliation logic.

The addition of annotationsNeedReconcile to the existing OR condition chain is well-placed and maintains the logical flow of the reconciliation trigger checks.

docs/configuration.md (2)

195-226: Excellent comprehensive documentation for the healthyDelay feature.

The documentation thoroughly covers:

Clear motivation and use cases (node "flapping" and validation periods)

Detailed explanation of delay behavior

Two distinct manual intervention methods with step-by-step instructions

Node-specific vs CR-wide impact considerations

This provides users with all the information needed to understand and use the feature effectively.

61-61: Well-integrated table entry for the new field.

The healthyDelay field documentation fits naturally into the existing spec table format and provides a concise but complete description.

controllers/resources/manager.go (5)

30-36: Well-defined constants for the healthy delay feature.

The constants are appropriately named and documented:

HealthyDelayContextKey for context value passing

RemediationHealthyDelayAnnotationKey for CR delay tracking

RemediationManuallyConfirmedHealthyAnnotationKey for manual node override

The naming convention follows existing patterns and the comments clearly explain their purpose.

321-383: Robust implementation of healthy node handling with delay support.

The enhanced HandleHealthyNode method correctly:

Returns requeue duration for delay management

Integrates manual confirmation annotation checking

Implements delay logic with proper error handling

Uses clear control flow with the shouldDelete flag

Updates status appropriately when delays are active

The implementation properly handles all delay scenarios and maintains backward compatibility.

385-425: Comprehensive delay calculation with proper time management.

The calcCrDeletionDelay method effectively handles all delay scenarios:

No delay configured: immediate deletion

Zero delay: immediate deletion

Negative delay: permanent postponement

Positive delay: time-based postponement with RFC3339 timestamp tracking

The annotation-based persistence approach ensures delay state survives reconciliation cycles, and the time calculations are accurate.

427-454: Clean annotation management helpers.

The helper methods isManuallyConfirmedHealthyAnnotationSet and removeConfirmedHealthyAnnotation are well-implemented:

Proper error handling for node retrieval

Safe annotation checking and removal

Clear method names and functionality

These provide good separation of concerns for annotation management.

456-465: Appropriate cleanup integration.

The updated CleanUp method properly integrates manual confirmation handling by removing the annotation when all CRs are deleted. The boolean parameter clearly indicates when this cleanup should occur.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

clobrano

/lgtm

giving the chance to get further reviews
/hold

mshitrit · 2025-05-19T08:56:09Z

giving the chance to get further reviews

//cc @slintes

controllers/nodehealthcheck_controller.go

controllers/resources/manager.go

api/v1alpha1/nodehealthcheck_types.go

controllers/resources/manager.go

slintes · 2025-05-20T15:49:08Z

@coderabbitai review

coderabbitai · 2025-05-20T15:49:30Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

slintes · 2025-05-20T15:53:28Z

@coderabbitai full review

coderabbitai · 2025-05-20T15:53:33Z

✅ Actions performed

Full review triggered.

slintes

nodeUpdateNeedsReconcile() needs to handle annotations

mshitrit · 2025-06-25T14:05:27Z

nodeUpdateNeedsReconcile() needs to handle annotations

Good point.
IIUC currently this part being omitted means that the annotation change does not trigger the reconcile.
I wonder though why did the unit test pass.
Maybe the reconcile was triggered due to a different reason, node status heartbeat maybe 🤔

controllers/resources/manager.go

clobrano

I left a note

controllers/resources/manager.go

slintes · 2025-06-30T08:15:40Z

/test 4.17-openshift-e2e

slintes · 2025-06-30T12:49:03Z

/test 4.16-openshift-e2e
/test 4.17-openshift-e2e

slintes · 2025-07-01T08:05:20Z

/test 4.16-openshift-e2e
/test 4.17-openshift-e2e

… period of time has passed. An annotation on CR is used to manage that, and a status update is added on the CR to note node healthiness is delayed Signed-off-by: Michael Shitrit <[email protected]>

Signed-off-by: Michael Shitrit <[email protected]>

- Requeue a reconcile in case CR deletion is delayed - Fix CSV descriptions - Some refactoring for better readability and usage - Fix status update - Update Healthy Delay validation to allow allow negative values - Use pointers for new API fields - Update md with info regarding new spec - Add remediation.medik8s.io/manually-confirmed-healthy annotation support, in order to enable user to manually terminate delay for specific nodes. - Trigger reconcile upon change of RemediationManuallyConfirmedHealthy Annotation - Move annotation removal to cleanup phase Signed-off-by: Michael Shitrit <[email protected]>

Signed-off-by: Michael Shitrit <[email protected]>

openshift-ci · 2025-07-03T14:44:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mshitrit, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mshitrit,slintes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mshitrit · 2025-07-08T07:34:34Z

/retest

mshitrit · 2025-07-08T11:52:03Z

/override 4.20-openshift-e2e

openshift-ci · 2025-07-08T11:52:07Z

@mshitrit: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

4.20-openshift-e2e

Only the following failed contexts/checkruns were expected:

CodeRabbit
ci/prow/4.16-ci-bundle-my-bundle
ci/prow/4.16-images
ci/prow/4.16-openshift-e2e
ci/prow/4.16-test
ci/prow/4.17-ci-bundle-my-bundle
ci/prow/4.17-images
ci/prow/4.17-openshift-e2e
ci/prow/4.17-test
ci/prow/4.18-ci-bundle-my-bundle
ci/prow/4.18-images
ci/prow/4.18-openshift-e2e
ci/prow/4.18-test
ci/prow/4.19-ci-bundle-my-bundle
ci/prow/4.19-images
ci/prow/4.19-openshift-e2e
ci/prow/4.19-test
ci/prow/4.20-ci-bundle-my-bundle
ci/prow/4.20-images
ci/prow/4.20-openshift-e2e
ci/prow/4.20-test
pull-ci-medik8s-node-healthcheck-operator-main-4.16-ci-bundle-my-bundle
pull-ci-medik8s-node-healthcheck-operator-main-4.16-images
pull-ci-medik8s-node-healthcheck-operator-main-4.16-openshift-e2e
pull-ci-medik8s-node-healthcheck-operator-main-4.16-test
pull-ci-medik8s-node-healthcheck-operator-main-4.17-ci-bundle-my-bundle
pull-ci-medik8s-node-healthcheck-operator-main-4.17-images
pull-ci-medik8s-node-healthcheck-operator-main-4.17-openshift-e2e
pull-ci-medik8s-node-healthcheck-operator-main-4.17-test
pull-ci-medik8s-node-healthcheck-operator-main-4.18-ci-bundle-my-bundle
pull-ci-medik8s-node-healthcheck-operator-main-4.18-images
pull-ci-medik8s-node-healthcheck-operator-main-4.18-openshift-e2e
pull-ci-medik8s-node-healthcheck-operator-main-4.18-test
pull-ci-medik8s-node-healthcheck-operator-main-4.19-ci-bundle-my-bundle
pull-ci-medik8s-node-healthcheck-operator-main-4.19-images
pull-ci-medik8s-node-healthcheck-operator-main-4.19-openshift-e2e
pull-ci-medik8s-node-healthcheck-operator-main-4.19-test
pull-ci-medik8s-node-healthcheck-operator-main-4.20-ci-bundle-my-bundle
pull-ci-medik8s-node-healthcheck-operator-main-4.20-images
pull-ci-medik8s-node-healthcheck-operator-main-4.20-openshift-e2e
pull-ci-medik8s-node-healthcheck-operator-main-4.20-test
tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override 4.20-openshift-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mshitrit · 2025-07-08T11:52:39Z

/override ci/prow/4.20-openshift-e2e

openshift-ci · 2025-07-08T11:52:44Z

@mshitrit: Overrode contexts on behalf of mshitrit: ci/prow/4.20-openshift-e2e

In response to this:

/override ci/prow/4.20-openshift-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci bot added the do-not-merge/work-in-progress label Apr 15, 2025

openshift-ci bot added the approved label Apr 15, 2025

mshitrit force-pushed the nhc-healthy-delay branch 2 times, most recently from 802200b to 5bb81fc Compare April 21, 2025 07:32

mshitrit changed the title ~~[WIP] NHC healthy delay~~ NHC healthy delay Apr 24, 2025

slintes requested changes May 9, 2025

View reviewed changes

api/v1alpha1/nodehealthcheck_types.go Outdated Show resolved Hide resolved

controllers/resources/manager.go Outdated Show resolved Hide resolved

controllers/resources/manager.go Outdated Show resolved Hide resolved

openshift-ci bot assigned slintes May 9, 2025

clobrano requested changes May 9, 2025

View reviewed changes

controllers/resources/manager.go Outdated Show resolved Hide resolved

openshift-ci bot assigned clobrano May 9, 2025

clobrano requested changes May 12, 2025

View reviewed changes

mshitrit force-pushed the nhc-healthy-delay branch from 448f583 to abe8a9d Compare May 12, 2025 11:15

slintes reviewed May 12, 2025

View reviewed changes

controllers/nodehealthcheck_controller.go Outdated Show resolved Hide resolved

mshitrit force-pushed the nhc-healthy-delay branch from abe8a9d to e2fab75 Compare May 13, 2025 13:49

slintes requested changes May 14, 2025

View reviewed changes

controllers/nodehealthcheck_controller.go Outdated Show resolved Hide resolved

controllers/resources/manager.go Outdated Show resolved Hide resolved

controllers/resources/manager.go Show resolved Hide resolved

controllers/nodehealthcheck_controller.go Outdated Show resolved Hide resolved

mshitrit force-pushed the nhc-healthy-delay branch from e2fab75 to 83a26f8 Compare May 15, 2025 07:25

mshitrit force-pushed the nhc-healthy-delay branch from 83a26f8 to 226c757 Compare May 15, 2025 18:39

clobrano approved these changes May 19, 2025

View reviewed changes

openshift-ci bot added lgtm do-not-merge/hold labels May 19, 2025

slintes requested changes May 20, 2025

View reviewed changes

openshift-ci bot removed the lgtm label May 20, 2025

slintes requested changes Jun 24, 2025

View reviewed changes

mshitrit force-pushed the nhc-healthy-delay branch from ed25fb9 to b170c73 Compare June 25, 2025 12:33

mshitrit force-pushed the nhc-healthy-delay branch 2 times, most recently from 64be754 to 1afee04 Compare June 26, 2025 16:16

slintes requested changes Jun 27, 2025

View reviewed changes

controllers/resources/manager.go Outdated Show resolved Hide resolved

controllers/resources/manager.go Outdated Show resolved Hide resolved

clobrano requested changes Jun 27, 2025

View reviewed changes

controllers/resources/manager.go Show resolved Hide resolved

mshitrit force-pushed the nhc-healthy-delay branch from 1afee04 to 203a991 Compare June 29, 2025 11:22

mshitrit added 4 commits July 3, 2025 07:51

Adding configuration which enables considering node unhealthy until a…

ebe5d88

… period of time has passed. An annotation on CR is used to manage that, and a status update is added on the CR to note node healthiness is delayed Signed-off-by: Michael Shitrit <[email protected]>

Add a unit test for healthy delay

07c84b8

Signed-off-by: Michael Shitrit <[email protected]>

Fixing merge conflicts

1ec21b4

Signed-off-by: Michael Shitrit <[email protected]>

mshitrit force-pushed the nhc-healthy-delay branch from 203a991 to 1ec21b4 Compare July 3, 2025 05:12

slintes approved these changes Jul 3, 2025

View reviewed changes

openshift-ci bot added the lgtm label Jul 3, 2025

mshitrit marked this pull request as ready for review July 7, 2025 10:44

openshift-ci bot removed the do-not-merge/work-in-progress label Jul 7, 2025

openshift-ci bot requested review from beekhof and razo7 July 7, 2025 10:44

mshitrit removed the do-not-merge/hold label Jul 7, 2025

openshift-merge-bot bot merged commit c3ce42b into medik8s:main Jul 8, 2025
25 of 26 checks passed

NHC healthy delay #365

NHC healthy delay #365

Uh oh!

Conversation

mshitrit commented Apr 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why we need this PR

Changes made

Which issue(s) this PR fixes

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Apr 15, 2025

Uh oh!

mshitrit commented Apr 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mshitrit commented May 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

clobrano left a comment

Choose a reason for hiding this comment

Uh oh!

mshitrit commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slintes commented May 20, 2025

Uh oh!

coderabbitai bot commented May 20, 2025

Uh oh!

slintes commented May 20, 2025

Uh oh!

coderabbitai bot commented May 20, 2025

Uh oh!

slintes left a comment

Choose a reason for hiding this comment

Uh oh!

mshitrit commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

clobrano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slintes commented Jun 30, 2025

Uh oh!

slintes commented Jun 30, 2025

mshitrit commented Apr 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)