You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "SENTINEL FAILOVER" command does not offer to failover in a coordinated way
e.g. for maintenance. It just assumes that the current primary isn't reachable anymore,
easily leading to situations with multiple primarys and loss of writes.
Introduce a "coordinated" variant of the "SENTINEL FAILOVER" command that actually
takes the current primary into account during the failover.
Description of the feature
Since version 6.2, Redis supports the "FAILOVER" command to
switch primary and replica roles in a coordinated fashion. However, one cannot use
this command in a sentinel setup (sentinel will usually failover again).
Additionally, the semantics of "FAILOVER" is different from the semantics defined
in the sentinel client protocol. In the latter, a client will be disconnected
when nodes switch roles. "FAILOVER" only disconnects connections with blocking commands.
(Interestingly, there are clients that disconnect if they find that a connection turns
read only (e.g. redis-py))
We could try to make "FAILOVER" work nevertheless: Below is a proposal for a SENTINEL FAILOVER <primary name> COORDINATED command that uses "FAILOVER"
in a modified forced failover procedure.
Alternatives you've considered
Adapt the "FAILOVER" command for this use case (e.g. by killing client connections
after failover. But how do we keep up the connection that is used to control
the failover?)
Additional information
The proposed implementation is at #1292. We can keep the current failover state machine with the following
changes for a coordinated failover:
SENTINEL_FAILOVER_STATE_NONE
No change required
SENTINEL_FAILOVER_STATE_WAIT_START
The coordinated case asks for a leader election in contrast to the standard "SENTINEL FAILOVER"
case. We don't need to change the command used for voting, as it works irrespective of the
subjective up/down state of the primary (all sentinels except the one in charge of the failover
will typically regard the primaty as up).
Once the quorum is reached, the state machine will proceed to the next state.
The benefit of this approach is that other sentinels won't start a failover for the failover timeout period.
As a primary may seem unreachable during the failover procedure (because clients are blocked), other sentinels might initiate a failover during the coordinated failover otherwise.
SENTINEL_FAILOVER_STATE_SELECT_SLAVE
No change required
SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE
Instead of using "SLAVEOF NOONE", send (to the current primary):
MULTI
CLIENT PAUSE <timeout> WRITE
FAILOVER TO <promoted replica host> <promoted replica port> TIMEOUT <timeout>
EXEC
and progress to SENTINEL_FAILOVER_STATE_WAIT_PROMOTION.
Rationale for the CLIENT PAUSE:
As said above, according to the sentinel client protocol, the client shall
be disconnected when the primary role changes to replica.
FAILOVER does not do that, it just pauses writing clients and unpauses them
once it gets the acknowledgement for the PSYNC FAILOVER.
Fortunately, a CLIENT PAUSE takes precedence over the client pause that is part
of FAILOVER. When the FAILOVER succeeds (in time), the clients remain paused and we
can disconnect them in the next state.
NB: The CLIENT PAUSE is tricky, because we must avoid any write commands in
order not to become paused ourselves. INFO and PING are fine, but the PUBLISH for the
sentinel hello blocks. Additionally, other sentinels will begin to regard the primary as
not responding once they issue a PUBLISH.Thus, sentinels must not send PUBLISHEs while clients are
blocked (i.e. to a primary in failover state).
Note: If the primary does not understand or does not accept the "FAILOVER" command (which
should not happen as the command exists since Redis 6.2), the failover will simply timeout,
because no role switch will happen.
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION
No need to change the waiting part: Wait for the role switch in sentinelRefreshInstanceInfo.
However, when we detect the change, we need to take care of the clients now. Send:
MULTI
CONFIG REWRITE
CLIENT KILL TYPE normal
CLIENT KILL TYPE pubsub
CLIENT UNPAUSE
EXEC
To both the former primary (now already replica) and the new primary (before we
call the client reconf script)
If no switch happens until the failover_timeout is reached, the sentinel failover
will be aborted. The FAILOVER command should already have timed out by this time.
If the sentinel crashes before we can issue the commands above, clients will remain
connected and become unpaused at some point in time. However, once the remaining sentinels
will notice that the former primary is not a primary anymore they will initiate a
failover to a replica. This will disconnect the clients from the former primary.
SENTINEL_FAILOVER_STATE_RECONF_SLAVES
No change (We can continue to ignore the former primary, as it became a replica
of the new primary by FAILOVER. In contrast to the current "SENTINEL FAILOVER", we
don't need to reconfigure this node later)
"REPLICAOF" is not accepted during a failover. Thus, send a "FAILOVER ABORT" before sending a "REPLICAOF"
to a node in sentinelSendReplicaOf(). (If there is no ongoing failover, the error will just be ignored)
Sentinel monitors nodes for deviations from the expected state (wrong role or replication). Add a check
for replicas that are in a failover state for too long and reconfigure them.
The text was updated successfully, but these errors were encountered:
The problem/use-case that the feature addresses
The "SENTINEL FAILOVER" command does not offer to failover in a coordinated way
e.g. for maintenance. It just assumes that the current primary isn't reachable anymore,
easily leading to situations with multiple primarys and loss of writes.
Introduce a "coordinated" variant of the "SENTINEL FAILOVER" command that actually
takes the current primary into account during the failover.
Description of the feature
Since version 6.2, Redis supports the "FAILOVER" command to
switch primary and replica roles in a coordinated fashion. However, one cannot use
this command in a sentinel setup (sentinel will usually failover again).
Additionally, the semantics of "FAILOVER" is different from the semantics defined
in the sentinel client protocol. In the latter, a client will be disconnected
when nodes switch roles. "FAILOVER" only disconnects connections with blocking commands.
(Interestingly, there are clients that disconnect if they find that a connection turns
read only (e.g. redis-py))
We could try to make "FAILOVER" work nevertheless: Below is a proposal for a
SENTINEL FAILOVER <primary name> COORDINATED
command that uses "FAILOVER"in a modified forced failover procedure.
Alternatives you've considered
Adapt the "FAILOVER" command for this use case (e.g. by killing client connections
after failover. But how do we keep up the connection that is used to control
the failover?)
Additional information
The proposed implementation is at #1292. We can keep the current failover state machine with the following
changes for a coordinated failover:
SENTINEL_FAILOVER_STATE_NONE
No change required
SENTINEL_FAILOVER_STATE_WAIT_START
The coordinated case asks for a leader election in contrast to the standard "SENTINEL FAILOVER"
case. We don't need to change the command used for voting, as it works irrespective of the
subjective up/down state of the primary (all sentinels except the one in charge of the failover
will typically regard the primaty as up).
Once the quorum is reached, the state machine will proceed to the next state.
The benefit of this approach is that other sentinels won't start a failover for the failover timeout period.
As a primary may seem unreachable during the failover procedure (because clients are blocked), other sentinels might initiate a failover during the coordinated failover otherwise.
SENTINEL_FAILOVER_STATE_SELECT_SLAVE
No change required
SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE
Instead of using "SLAVEOF NOONE", send (to the current primary):
and progress to
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION
.Rationale for the CLIENT PAUSE:
As said above, according to the sentinel client protocol, the client shall
be disconnected when the primary role changes to replica.
FAILOVER does not do that, it just pauses writing clients and unpauses them
once it gets the acknowledgement for the PSYNC FAILOVER.
Fortunately, a CLIENT PAUSE takes precedence over the client pause that is part
of FAILOVER. When the FAILOVER succeeds (in time), the clients remain paused and we
can disconnect them in the next state.
NB: The CLIENT PAUSE is tricky, because we must avoid any write commands in
order not to become paused ourselves. INFO and PING are fine, but the PUBLISH for the
sentinel hello blocks. Additionally, other sentinels will begin to regard the primary as
not responding once they issue a PUBLISH.Thus, sentinels must not send PUBLISHEs while clients are
blocked (i.e. to a primary in failover state).
Note: If the primary does not understand or does not accept the "FAILOVER" command (which
should not happen as the command exists since Redis 6.2), the failover will simply timeout,
because no role switch will happen.
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION
No need to change the waiting part: Wait for the role switch in
sentinelRefreshInstanceInfo
.However, when we detect the change, we need to take care of the clients now. Send:
To both the former primary (now already replica) and the new primary (before we
call the client reconf script)
If no switch happens until the failover_timeout is reached, the sentinel failover
will be aborted. The FAILOVER command should already have timed out by this time.
If the sentinel crashes before we can issue the commands above, clients will remain
connected and become unpaused at some point in time. However, once the remaining sentinels
will notice that the former primary is not a primary anymore they will initiate a
failover to a replica. This will disconnect the clients from the former primary.
SENTINEL_FAILOVER_STATE_RECONF_SLAVES
No change (We can continue to ignore the former primary, as it became a replica
of the new primary by FAILOVER. In contrast to the current "SENTINEL FAILOVER", we
don't need to reconfigure this node later)
SENTINEL_FAILOVER_STATE_UPDATE_CONFIG
No change
Handling of Valkey instances stuck in Failover
The FAILOVER command may "encounter some scenarios it can not automatically remediate from and may get stuck.".
The supervision of the sentinels can be used to handle these situations:
to a node in
sentinelSendReplicaOf()
. (If there is no ongoing failover, the error will just be ignored)for replicas that are in a failover state for too long and reconfigure them.
The text was updated successfully, but these errors were encountered: