Skip to content

How to handle lost event? #1955

@bsch92

Description

@bsch92

We recently faced the problem that the OnRevokedEvent was lost, when using spring-cloud-kubernetes-fabric8-leader.

Our setup consists of 3 kubernetes pods with one leader. While our configuration is certainly not perfect (leadership is sometimes revoked while a pod is still running), it mostly works.

During the recent incident, the old leader was revoked because the readiness-check failed but the pod didn't shut down. Shortly after, another pod got elected as the new leader and started to do the leader-specific work. Meanwhile, the old leader didn't receive the OnRevokedEvent and kept on thinking that he was still the leader.

As a result, we had two leaders running until somebody noticed and manually shut down the "phantom-leader".

Sadly I can't reproduce the problem and the logs don't show any anomalies. Is this a known problem or how can I prevent this from happening?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions