Skip to content

Reduce the mean time to recovery (MTTR) in case of a failed deployment #5584

@stefanprodan

Description

@stefanprodan

In Flux 2.7 we've added a new feature to kustomize-controller to speed up the cluster reconciliation by canceling ongoing health checks when a new source revision is detected. This functionality can be enabled with the CancelHealthCheckOnNewRevision feature gate.

This feature is particularly useful to reduce the mean time to recovery (MTTR) in case of a failed deployment by skipping the remaining health checks when a new commit is pushed to fix the issue. It can also help reduce the time to deploy when using GitOps with high frequency commits.

The current implementation is limited to Flux Kustomizations and only works when the source revision changes. In Flux 2.8 we plan to extend this functionality with the following improvements:

  • React to changes in the Kustomization spec (e.g. path, patches, images, etc)
  • React to changes in referenced ConfigMaps and Secrets (var substitutions, SOPS decryption keys, Kubeconfig)
  • React to a reconciliation triggered manually with flux reconcile or via notification-controller receivers

In all these cases, the ongoing health checks will be canceled when a new reconciliation is scheduled. To improve the observability of this feature, a new reason will be added to the Kustomization status Ready condition named HealthCheckCanceled.

As for helm-controller, we are still evaluating the best approach to implement this feature. We have proposed a change to the Helm community to support canceling ongoing health checks in the Helm SDK and hopefully this will be available in Helm v4.

Important

Note that we plan to enable this feature by default only after the implementation is completed in both kustomize-controller and helm-controller. Users will still be able to opt-out by disabling the feature gate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    umbrella-issueUmbrella issue for tracking progress of a larger effort

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions