Skip to content

[actors] Document and Refine Invariants #1110

Open
@patrick-ogrady

Description

@patrick-ogrady

Recent work on p2p (#1108 + #1067) has uncovered a set of implicit (and important) invariants that our existing actors assume. We should take the time (for our own sanity) to properly document said invariants (and or refine them):

  • Throughout the codebase, we call unwrap() in Mailbox (
    impl<E: Spawner + Metrics, C: PublicKey> Mailbox<E, C> {
    /// Create a new mailbox for the tracker.
    pub(super) fn new(sender: mpsc::Sender<Message<E, C>>) -> Self {
    Self { sender }
    }
    /// Send a `Connect` message to the tracker.
    pub async fn connect(&mut self, public_key: C, dialer: bool, peer: peer::Mailbox<C>) {
    self.sender
    .send(Message::Connect {
    public_key,
    dialer,
    peer,
    })
    .await
    .unwrap();
    }
    /// Send a `Construct` message to the tracker.
    pub async fn construct(&mut self, public_key: C, peer: peer::Mailbox<C>) {
    self.sender
    .send(Message::Construct { public_key, peer })
    .await
    .unwrap();
    }
    /// Send a `BitVec` message to the tracker.
    pub async fn bit_vec(&mut self, bit_vec: types::BitVec, peer: peer::Mailbox<C>) {
    self.sender
    .send(Message::BitVec { bit_vec, peer })
    .await
    .unwrap();
    }
    /// Send a `Peers` message to the tracker.
    pub async fn peers(&mut self, peers: Vec<types::PeerInfo<C>>, peer: peer::Mailbox<C>) {
    self.sender
    .send(Message::Peers { peers, peer })
    .await
    .unwrap();
    }
    /// Send a `Block` message to the tracker.
    pub async fn dialable(&mut self) -> Vec<C> {
    let (sender, receiver) = oneshot::channel();
    self.sender
    .send(Message::Dialable { responder: sender })
    .await
    .unwrap();
    receiver.await.unwrap()
    }
    /// Send a `Dial` message to the tracker.
    pub async fn dial(&mut self, public_key: C) -> Option<Reservation<E, C>> {
    let (tx, rx) = oneshot::channel();
    self.sender
    .send(Message::Dial {
    public_key,
    reservation: tx,
    })
    .await
    .unwrap();
    rx.await.unwrap()
    }
    /// Send a `Listen` message to the tracker.
    pub async fn listen(&mut self, public_key: C) -> Option<Reservation<E, C>> {
    let (tx, rx) = oneshot::channel();
    self.sender
    .send(Message::Listen {
    public_key,
    reservation: tx,
    })
    .await
    .unwrap();
    rx.await.unwrap()
    }
    }
    ) and (incorrectly) assume that anytime we can't send a message to another actor we should panic. During graceful shutdown, this can occur and cause an unexpected panic. We should either surface the disconnected error from mailbox/return a boolean indicating the drop/or just ignore the panic (may make it difficult to assess when it is time to shutdown).
  • A recent change in p2p ([p2p] Update P2P discovery PR #921), made message ordering to tracker (from peers) critical for correctness (i.e. not panicking). We should form a strict opinion on whether other actors should ensure they always send messages that arrive at the destination in the right order (also implicitly enforcing the recipient to process incoming messages in a single loop) or make a best effort (dropping messages that don't make sense for the current actor state/returning an error). An example of this pattern is proposal handling in consensus:
    // If we have already moved to another view, drop the response as we will
    // not broadcast it
    if self.view != context.view {
    debug!(view = context.view, our_view = self.view, reason = "no longer in required view", "dropping requested proposal");
    continue;
    }
    . If we aren't careful, this could mask bugs/incorrect logic (in p2p, for example, a bug could prevent us from ever reconnecting to a certain peer if we drop messages unexpectedly).
  • It is possible that someone using actors could deadlock if two actors are messaging back and forth to each other (both hit a bounded channel max and can no longer add). We try to prevent this in p2p with rate limiting and channel prevents (using a biased select where we pull control messages that could cause a deadlock before data messages):
    msg_control = self.control.next() => {
    let msg = match msg_control {
    Some(msg_control) => msg_control,
    None => return Err(Error::PeerDisconnected),
    };
    let (metric, payload) = match msg {
    Message::BitVec(bit_vec) =>
    (metrics::Message::new_bit_vec(&peer), types::Payload::BitVec(bit_vec)),
    Message::Peers(peers) =>
    (metrics::Message::new_peers(&peer), types::Payload::Peers(peers)),
    Message::Kill => {
    return Err(Error::PeerKilled(peer.to_string()))
    }
    };
    Self::send(&mut conn_sender, &self.sent_messages, metric, payload)
    .await?;
    },
    msg_high = self.high.next() => {
    let msg = Self::validate_msg(msg_high, &rate_limits)?;
    Self::send(&mut conn_sender, &self.sent_messages, metrics::Message::new_data(&peer, msg.channel), types::Payload::Data(msg))
    .await?;
    },
    msg_low = self.low.next() => {
    let msg = Self::validate_msg(msg_low, &rate_limits)?;
    Self::send(&mut conn_sender, &self.sent_messages, metrics::Message::new_data(&peer, msg.channel), types::Payload::Data(msg))
    .await?;
    }
    . We may want to create some sort of rule that we must always send a oneshot response (to prevent this sort of deadlocking). In any case, something we should consider documenting best practices for.

I've historically found it "sane" to reason about these actors as independent entities that can handle invalid inputs behind well-defined interfaces (as a way to build large/complex crates without locks + better take advantage of parallelism). However, leaking the internal state machine of some actor to all other actors seems like the worst of all worlds (and very difficult to maintain). Getting on top of this seems prudent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions