-
Notifications
You must be signed in to change notification settings - Fork 686
Description
In Flux 2.7 we've added a new feature to kustomize-controller to speed up the cluster reconciliation by canceling ongoing health checks when a new source revision is detected. This functionality can be enabled with the CancelHealthCheckOnNewRevision feature gate.
This feature is particularly useful to reduce the mean time to recovery (MTTR) in case of a failed deployment by skipping the remaining health checks when a new commit is pushed to fix the issue. It can also help reduce the time to deploy when using GitOps with high frequency commits.
The current implementation is limited to Flux Kustomizations and only works when the source revision changes. In Flux 2.8 we plan to extend this functionality with the following improvements:
- React to changes in the Kustomization spec (e.g. path, patches, images, etc)
- React to changes in referenced ConfigMaps and Secrets (var substitutions, SOPS decryption keys, Kubeconfig)
- React to a reconciliation triggered manually with
flux reconcileor via notification-controller receivers
In all these cases, the ongoing health checks will be canceled when a new reconciliation is scheduled. To improve the observability of this feature, a new reason will be added to the Kustomization status Ready condition named HealthCheckCanceled.
As for helm-controller, we are still evaluating the best approach to implement this feature. We have proposed a change to the Helm community to support canceling ongoing health checks in the Helm SDK and hopefully this will be available in Helm v4.
Important
Note that we plan to enable this feature by default only after the implementation is completed in both kustomize-controller and helm-controller. Users will still be able to opt-out by disabling the feature gate.