Skip to content

RabbitMQ 3.8 cluster restart problem #2416

Answered by michaelklishin
gksnjy asked this question in Other
Discussion options

You must be logged in to vote

3.7.13 does not have feature flags, so they cannot hit what is effectively a timeout exception.

failed_waiting_for_tables means that no peers came online within a given window of time. On Kubernetes specifically, there is a well known scenario where a readinessProbe that assumes a fully booted node — such as the deprecated rabbitmq-diagnostics node_health_check — will deadlock the deployment as other nodes will never come online until the currently deployed one does not finish booting, and the current one waits for peers to be around before it considers a boot to be successful.

Use one of the basic health checks available today for readinessProbe and never rabbitmq-diagnostics node_health…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@gksnjy
Comment options

@michaelklishin
Comment options

Answer selected by michaelklishin
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #2416 on July 23, 2020 12:32.