Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail health check if intents reconcile starts and doesn't finish within 30s #507

Merged
merged 2 commits into from
Nov 3, 2024

Conversation

orishoshan
Copy link
Collaborator

Prior to this PR, slow reconcile performance due to limited CPU or due to IO (slow network, slow responses from the control plane) could cause the operator to function slowly. If this is slow enough, then the operator is essentially not functioning properly -- if it can't reconcile in a timely manner, in practice it's not reconciling successfully, even if ultimately it does succeed.

The operator will now fail its health check if a reconcile starts and does not complete within 30 seconds, which will cause it to restart and perhaps self-heal the issue, but also indicate to the cluster operators that something is wrong and needs to be investigated, long before the issue materializes some other way.

@orishoshan orishoshan enabled auto-merge (squash) November 3, 2024 15:35
@orishoshan orishoshan merged commit 208e78c into main Nov 3, 2024
20 checks passed
@orishoshan orishoshan deleted the orisho/successful_reconcile_healthcheck branch November 3, 2024 15:55
@github-actions github-actions bot locked and limited conversation to collaborators Nov 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants