You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the repository contains a leader-for-life based election model that ensures that a single leader is elected for life during a HA state.
To briefly describe what leader for life and leader for lease approaches are:
(1) Leader for life: A leader pod is selected for life until its garbage collected. This ensures there is only one leader at any instant of time.
(2) Leader for lease: Leader election happens periodically at a defined interval which can be tweaked. When the current leader is not able to renew the lease, a new leader is elected.
More details on both of these approaches are available here.
Approach (2) is implemented upstream, in client-go and is scaffolded by default for the past 30+ releases of SDK, since it is implemented as a part of setting up the manager in controller-runtime. It guarantees faster election of leaders, less downtime, recovery from disconnected/frozen node failures. However, it does not eliminate the split brain scenario - where more than a single leader is available at an instant of time.
Approach (1) on the other hand, was developed long before we had a leader-for-lease implemented upstream. Though it solves the split brain scenario, it does not guarantee recovery from node failure nor faster recovery. We also have issues with integrating it to controller-runtime (#48). Neither is it being maintained nor used as widely as leader for lease.
Here is a detailed comment explaining the preference of (2) over the other, wherein users would prefer a faster recovery even though there is split brain scenario intermittently, rather than an implementation that does not guarantee faster recovery.
Describe the solution you'd like
Since leader for life approach is not being widely used, neither works with controller-runtime seamlessly, it is better to adopt a well tested upstream library than to depend on what is currently available as an option.
The solution for this is:
To deprecate and remove leader for life in future releases of Operator SDK.
Feature Request
Is your feature request related to a problem? Please describe.
Currently, the repository contains a leader-for-life based election model that ensures that a single leader is elected for life during a HA state.
To briefly describe what leader for life and leader for lease approaches are:
(1) Leader for life: A leader pod is selected for life until its garbage collected. This ensures there is only one leader at any instant of time.
(2) Leader for lease: Leader election happens periodically at a defined interval which can be tweaked. When the current leader is not able to renew the lease, a new leader is elected.
More details on both of these approaches are available here.
Approach (2) is implemented upstream, in client-go and is scaffolded by default for the past 30+ releases of SDK, since it is implemented as a part of setting up the manager in controller-runtime. It guarantees faster election of leaders, less downtime, recovery from disconnected/frozen node failures. However, it does not eliminate the split brain scenario - where more than a single leader is available at an instant of time.
Approach (1) on the other hand, was developed long before we had a leader-for-lease implemented upstream. Though it solves the split brain scenario, it does not guarantee recovery from node failure nor faster recovery. We also have issues with integrating it to controller-runtime (#48). Neither is it being maintained nor used as widely as leader for lease.
Here is a detailed comment explaining the preference of (2) over the other, wherein users would prefer a faster recovery even though there is split brain scenario intermittently, rather than an implementation that does not guarantee faster recovery.
Describe the solution you'd like
Since leader for life approach is not being widely used, neither works with controller-runtime seamlessly, it is better to adopt a well tested upstream library than to depend on what is currently available as an option.
The solution for this is:
The text was updated successfully, but these errors were encountered: