Skip to content

[WiP] Simplex consensus implementation observations #1102

Open
@dnkolegov-ar

Description

@dnkolegov-ar

!! This is a work-in-progress document related to Simples protocol implementation in consensus/src/simplex. This issue will be updated.

Simplex Invariants

  • It is impossible to produce both a nullification and finalization certificate for the same view v
  • Suppose that a container C for some view v is notarized, then no other container C' for view v can't be notarized
  • Replica may issue a finalize vote for v, only if it has not already issued a nullify vote for v — this rule is essential for safety
  • A correct replica should not send two votes in the same view if it is not a nullify vote

Simplex Observations

  • These lines stress “ Ensure we notarize before we finalize” and “Ensure we broadcast notarization before we finalize” and by doing so, prevent a replica from sending a Finalize message if it has not sent Notarize and Notarization messages. At the same time, the algorithm explicitly does not forbid sending a Finalize without first sending a Notarize or Notarization. For example, the replica can be slow but correct, and it can get Notarize messages before sending its own Notarize vote. The descriptions of the protocol in AvaLabs, Sing a song of Simplex and CommonWare don't mention that:
    • CommonWare’s Simplex: Doesn’t require sending Notarize, but requires sending Notarization
    • Ava Labs’ protocol: “Starting from round i+1, each node that collects a quorum of votes on <vote, i, H(b)> broadcasts a finalization message <finalize, i, H(b)>.”
    • Shoup’s protocol: “if the log contains a block for slot v then if not complained (nullify has not been sent), then broadcast a commit share (finalize) for v
  • threshold() (e.g., consensus/src/simplex/actors/voter/actor.rs:462) and quorum() used simultaneously (e.g., consensus/src/simplex/actors/voter/actor.rs:1452). Is it possible to use only one of them?
  • unwrap() are used. What is the source of hope that they do not panic? Why are we sure that the implementation that accesses I/O and network resources externally guarantees panic-free operation? For example,
    • consensus/src/simplex/actors/voter/actor.rs:1557
    • consensus/src/simplex/actors/voter/actor.rs:1622
    • consensus/src/simplex/actors/voter/actor.rs:1894
  • The description of the protocol does not explicitly define whether we should rebroadcast nullification, notarization, or finalizaition certificates if a replica receives them
    • At the same time, according to the implementation, when the replica receives a notarization message in consensus/src/simplex/actors/voter/actor.rs:1958, it calls self.notarization(), then if the node is in the interesting view and it has not sent notarization, then it just handled it, but does not send it. Is that correct?
    • It is not obvious when such notarizations should be sent, especially in cases where we have or have not sent our notarization (and the same applies to finalization). Shoup’s interpretation of the protocol says the following “whenever a party receives a support, commit, or complaint certificate, and it does not already have a corresponding certificate, it will add it to the pool, and broadcast the certificate to all parties”. These lines are also not clear enough and do not define what “the same” means.
    • So it would be great to define that mechanism and polish the description of the protocol related to certificate rebroadcasts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions