-
Notifications
You must be signed in to change notification settings - Fork 101
Description
Summary or problem description
This issue is triggered by the dBFT 2.0 liveness lock problem mentioned in neo-project/neo#2029 (comment) and other discussions in issues and PRs. We've used TLA+ formal modeling tool to analyze dBFT 2.0 behaviour. In this issue we'll present the formal algorithm specification with a set of identified problems and propose ways to fix them in so-called dBFT 2.1.
dBFT 2.0 formal models
We've created two models of dBFT 2.0 algorithm in TLA+. Please, consider reading the brief introduction to our modelling approach at the README and take a look at the base model. Below we present several error traces that were found by TLC Model Checker in the four-nodes network scenario.
Model checking note
Please, consider reading the model checking note before exploring the error traces below.
1. Liveness lock with four non-faulty nodes
The TLA+ specification configuration assumes participating of the four non-faulty replicas precisely following the dBFT 2.0 algorithm and maximum reachable view set to be 2. Here's the model values configuration used for TLC Model Checker in the described scenario:
RM | RMFault | RMDead | MaxView |
---|---|---|---|
{0, 1, 2, 3} | {} | {} | 2 |
The following liveness lock scenario was found by the TLC Model Checker:
Steps to reproduce the liveness lock
- The primary (at view 0) replica 0 sends the
PrepareRequest
message. - The primary (at view 0) replica 0 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The backup (at view 0) replica 1 receives the
PrepareRequest
of view 0 and broadcasts itsPrepareResponse
. - The backup (at view 0) replica 1 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The backup (at view 0) replica 2 receives the
PrepareRequest
of view 0 and broadcasts itsPrepareResponse
. - The backup (at view 0) replica 2 collects
M
prepare messages (from itself and replicas 0, 1) and broadcasts theCommit
message for view 0. - The backup (at view 0) replica 3 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The primary (at view 0) replica 0 collects
M
ChangeView
messages (from itself and replicas 1, 3) and changes its view to 1. - The backup (at view 0) replica 1 collects
M
ChangeView
messages (from itself and replicas 0, 3) and changes its view to 1. - The primary (at view 1) replica 1 sends the
PrepareRequest
message. - The backup (at view 1) replica 0 receives the
PrepareResuest
of view 1 and sends thePrepareResponse
. - The backup (at view 1) replica 0 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The primary (at view 1) replica 1 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The backup (at view 0) replica 3 collects
M
ChangeView
messages (from itself and replicas 0, 1) and changes its view to 1. - The backup (at view 1) replica 3 receives
PrepareRequest
of view 1 and broadcasts itsPrepareResponse
. - The backup (at view 1) replica 3 collects
M
prepare message and broadcasts theCommit
message for view 1. - The rest of undelivered messages eventually reaches their receivers, but it doesn't change the node's states.
Here's the TLC error trace attached: base_deadlock1_dl.txt
After the described sequence of steps we end up in the following situation:
Replica | View | State |
---|---|---|
0 | 1 | ChangeView sent, in the process of changing view from 1 to 2 |
1 | 1 | ChangeView sent, in the process of changing view from 1 to 2 |
2 | 0 | Commit sent, waiting for the rest nodes to commit at view 0 |
3 | 1 | Commit sent, waiting for the rest nodes to commit at view 1 |
So we have the replica 2 stuck at the view 0 without possibility to exit from the commit stage and without possibility to collect more Commit
messages from other replicas. We also have replica 3 stuck at the view 1 with the same problem. And finally, we have
replicas 0 and 1 entered the view changing stage and not being able either to commit (as there's only F
nodes that have been committed at the view 1) or to change view (as the replica 2 can't send its ChangeView
from the commit stage).
This liveness lock happens because the outcome of the subsequent consensus round (either commit or do change view) completely depends on the message receiving order. Moreover, we've faced with exactly this kind of deadlock in real functioning network, this incident was fixed by the consensus nodes restarting.
2. Liveness lock with one "dead" node and three non-faulty nodes
The TLA+ specification configuration assumes participating of the three non-faulty nodes precisely following the dBFT 2.0 algorithm and one node which can "die" and stop sending consensus messages or changing its state at any point of the behaviour. The liveness lock can be reproduced both when the first primary node is able to "die" and when the non-primary node is "dying" in the middle of the consensus process. The maximum reachable view set to be 2. Here are two model values configurations used for TLC Model Checker in the described scenario:
RM | RMFault | RMDead | MaxView |
---|---|---|---|
{0, 1, 2, 3} | {} | {0} | 2 |
{0, 1, 2, 3} | {} | {1} | 2 |
The following liveness lock scenario was found by the TLC Model Checker:
Steps to reproduce the liveness lock (first configuration with primary node "dying" is taken as an example)
- The primary (at view 0) replica 0 sends the
PrepareRequest
message. - The primary (at view 0) replica is "dying" and can't send or handle any consensus messages further.
- The backup (at view 0) replica 1 receives the
PrepareRequest
of view 0 and broadcasts itsPrepareResponse
. - The backup (at view 0) replica 1 decides to change its view (possible on timeout) and sends the
ChangeView
message. - The backup (at view 0) replica 2 receives the
PrepareRequest
of view 0 and broadcasts itsPrepareResponse
. - The backup (at view 0) replica 2 collects
M
prepare messages (from itself and replicas 0, 1) and broadcasts theCommit
message for view 0. - The backup (at view 0) replica 3 decides to change its view (possible on timeout) and sends the
ChangeView
message.
Here's the TLC error traces attached:
- TLC error trace for the first configuration (primary is "dying"): base_deadlock2_dl.txt
- TLC error trace for the second configuration (non-primary is "dying"): base_deadlock3_dl.txt
After the described sequence of steps we end up in the following situation:
Replica | View | State |
---|---|---|
0 | 0 | Dead (but PrepareRequest sent for view 0). |
1 | 0 | ChangeView sent, in the process of changing view from 0 to 1 |
2 | 0 | Commit sent, waiting for the rest nodes to commit at view 0 |
3 | 0 | ChangeView sent, in the process of changing view from 0 to 1 |
So we have the replica 0 permanently dead at the view 0 without possibility to affect the consensus process. Replica 2 has its Commit
sent and unsuccessfully waiting for other replicas to enter the commit stage as far. Finally, replicas 1 and 3 have entered the view changing stage and not being able either to commit (as there's only F
nodes that have been committed at the view 1) or to change view (as the replica 2 can't send its ChangeView
from the commit stage).
It should be noted that dBFT 2.0 algorithm is expected to guarantee the block acceptance with at least M
non-faulty ("alive" in this case) nodes, which isn't true in the described situation.
3. Running the TLC model checker with "faulty" nodes
Both models allow to specify the set of malicious nodes indexes via RMFault
model constant. "Faulty" nodes are allowed to send any valid message at any step of the behaviour. At the same time, weak fairness is required from the next-state action predicate for both models. It means if it's possible for the model to take any non-stuttering step, this step must eventually be taken. Thus, running the basic dBFT 2.0 model with single faulty and three non-faulty nodes doesn't reveal any model deadlock: the malicious node keeps sending messages to escape from the liveness lock. It should also be noticed that the presence of faulty nodes slightly increases the states graph size, so that it takes more time to evaluate the whole set of possible model behaviours.
Nethertheless, it's a thought-provoking experiment to check the model specification behaviour with non-empty faulty nodes set. We've checked the basic model with the following configurations and didn't find any liveness lock:
RM | RMFault | RMDead | MaxView |
---|---|---|---|
{0, 1, 2, 3} | {0} | {} | 2 |
{0, 1, 2, 3} | {1} | {} | 2 |
dBFT 2.1 proposed models
Based on the liveness issues found by the TLC Model Checker we've developed a couple of ways to improve dBFT 2.0 algorithm and completely avoid mentioned liveness and safety problems. The improved models will further be referred to as dBFT 2.1 models. The model descriptions and specifications are too large to attach them to this issue, thus, they are kept in a separate repo. Please, consider reading the dBFT 2.1 models description at the README and check the models TLA+ specifications.
Running the TLC Model Checker on dBFT 2.1 models
We've run the TLC Model Checker to check both proposed dBFT 2.1 models with the same set of configurations as described above for the basic model. Here's the table of configurations and model checking results (they are almost the same for both dBFT 2.1 models):
RM | RMFault | RMDead | MaxView | Model checking result |
---|---|---|---|---|
{0, 1, 2, 3} | {} | {} | 1 | PASSED , no liveness lock found |
{0, 1, 2, 3} | {} | {} | 2 | NOT FINISHED , TLC failed with OOM (too many states, see the model checking note), no liveness lock found |
{0, 1, 2, 3} | {} | {0} | 1 | PASSED , no liveness lock found |
{0, 1, 2, 3} | {} | {0} | 2 | NOT FINISHED , TLC failed with OOM (too many states, see the model checking note), no liveness lock found |
{0, 1, 2, 3} | {} | {1} | 1 | PASSED , no liveness lock found |
{0, 1, 2, 3} | {} | {1} | 2 | NOT FINISHED , TLC failed with OOM (too many states, see the model checking note), no liveness lock found |
{0, 1, 2, 3} | {0} | {} | 1 | FAILED for dbftCV3.tla , see the note below the table. |
{0, 1, 2, 3} | {0} | {} | 1 | NOT FINISHED for dbftCentralizedCV.tla in reasonable time; the "faulty" node description probably needs some more care. |
Note for the FAILED
case: assuming that malicious nodes are allowed to send literally any message at any step of the behaviour, non-empty RMFault
set is able to ruin the whole consensus process. However, current dBFT 2.0 algorithm faces the liveness lock even with the "dead" nodes (the proposed dBFT 2.1 models successfully handle this case). Moreover, we believe that in real functioning network it's more likely to face with the first type of attack ("dead" nodes) rather than with the malicious clients intentionally sending any type of messages at any time. Thus, we believe that the dBFT 2.1 models perform much better than the dBFT 2.0 and solve the dBFT 2.0 liveness lock problems.
Other than this, the TLC Model Checker didn't find any liveness properties violations for dBFT 2.1 models both with MaxView
constraint set to 1
and MaxView
constraint set to 2
(see the model checking note for clarification).
Conclusion
We believe that proposed dBFT 2.1 models allow to solve the dBFT 2.0 liveness lock problems. Anyone who has thoughts, ideas, questions, suggestions or doubts is welcomed to join the discussion. The proposed specifications may have bugs or inaccuracies thus we accept all kinds of reasoned comments and related feedback. If you have troubles with the models understanding/editing/checking, please, don't hesitate to write a comment to this issue.
Further work
The whole idea of TLA+ dBFT modelling was to reveal and avoid the liveness lock problems so that we've got a normally operating consensus algorithm in our network. There are several directions for the further related work in our mind:
- Collect and process the community feedback on proposed TLA+ dBFT 2.0 and 2.1 specifications, fix the models (in case of any bugs found). The discussion period is two months starting from the issue submitting date. Please, consider writing your feedback in time! After the discussion we're starting the dBFT 2.1 code-level implementation based on the discussion results.
- Implement the improved dBFT 2.1 algorithm at the code level.
- Create TLA+ specification and investigate any potential problems of the dBFT 3.0 (double speaker model, dBFT 3.0 Specification neo#2029).