[actors] Document and Refine Invariants

Recent work on `p2p` (https://github.com/commonwarexyz/monorepo/pull/1108 + https://github.com/commonwarexyz/monorepo/pull/1067) has uncovered a set of implicit (and important) invariants that our existing actors assume. We should take the time (for our own sanity) to properly document said invariants (and or refine them):

* Throughout the codebase, we call `unwrap()` in `Mailbox` (https://github.com/commonwarexyz/monorepo/blob/c46312f8af42edb7bbeefc1bd8c0216eaff871f3/p2p/src/authenticated/actors/tracker/ingress.rs#L111-L188) and (incorrectly) assume that anytime we can't send a message to another actor we should panic. During graceful shutdown, this can occur and cause an unexpected panic. We should either surface the disconnected error from mailbox/return a boolean indicating the drop/or just ignore the panic (may make it difficult to assess when it is time to shutdown).
* A recent change in `p2p` (https://github.com/commonwarexyz/monorepo/pull/921), made message ordering to `tracker` (from peers) critical for correctness (i.e. not panicking). We should form a strict opinion on whether other actors should ensure they always send messages that arrive at the destination in the right order (also implicitly enforcing the recipient to process incoming messages in a single loop) or make a best effort (dropping messages that don't make sense for the current actor state/returning an error). An example of this pattern is proposal handling in consensus: https://github.com/commonwarexyz/monorepo/blob/c46312f8af42edb7bbeefc1bd8c0216eaff871f3/consensus/src/threshold_simplex/actors/voter/actor.rs#L1874-L1879. If we aren't careful, this could mask bugs/incorrect logic (in p2p, for example, a bug could prevent us from ever reconnecting to a certain peer if we drop messages unexpectedly).
* It is possible that someone using actors could deadlock if two actors are messaging back and forth to each other (both hit a bounded channel max and can no longer add). We try to prevent this in `p2p` with rate limiting and channel prevents (using a biased select where we pull control messages that could cause a deadlock before data messages): https://github.com/commonwarexyz/monorepo/blob/f6a179e8a7263d6f41000ec283f08aa54a891982/p2p/src/authenticated/actors/peer/actor.rs#L145-L171. We may want to create some sort of rule that we must always send a oneshot response (to prevent this sort of deadlocking). In any case, something we should consider documenting best practices for.

I've historically found it "sane" to reason about these actors as independent entities that can handle invalid inputs behind well-defined interfaces (as a way to build large/complex crates without locks + better take advantage of parallelism). However, leaking the internal state machine of some actor to all other actors seems like the worst of all worlds (and very difficult to maintain). Getting on top of this seems prudent.

	impl<E: Spawner + Metrics, C: PublicKey> Mailbox<E, C> {
	/// Create a new mailbox for the tracker.
	pub(super) fn new(sender: mpsc::Sender<Message<E, C>>) -> Self {
	Self { sender }
	}

	/// Send a `Connect` message to the tracker.
	pub async fn connect(&mut self, public_key: C, dialer: bool, peer: peer::Mailbox<C>) {
	self.sender
	.send(Message::Connect {
	public_key,
	dialer,
	peer,
	})
	.await
	.unwrap();
	}

	/// Send a `Construct` message to the tracker.
	pub async fn construct(&mut self, public_key: C, peer: peer::Mailbox<C>) {
	self.sender
	.send(Message::Construct { public_key, peer })
	.await
	.unwrap();
	}

	/// Send a `BitVec` message to the tracker.
	pub async fn bit_vec(&mut self, bit_vec: types::BitVec, peer: peer::Mailbox<C>) {
	self.sender
	.send(Message::BitVec { bit_vec, peer })
	.await
	.unwrap();
	}

	/// Send a `Peers` message to the tracker.
	pub async fn peers(&mut self, peers: Vec<types::PeerInfo<C>>, peer: peer::Mailbox<C>) {
	self.sender
	.send(Message::Peers { peers, peer })
	.await
	.unwrap();
	}

	/// Send a `Block` message to the tracker.
	pub async fn dialable(&mut self) -> Vec<C> {
	let (sender, receiver) = oneshot::channel();
	self.sender
	.send(Message::Dialable { responder: sender })
	.await
	.unwrap();
	receiver.await.unwrap()
	}

	/// Send a `Dial` message to the tracker.
	pub async fn dial(&mut self, public_key: C) -> Option<Reservation<E, C>> {
	let (tx, rx) = oneshot::channel();
	self.sender
	.send(Message::Dial {
	public_key,
	reservation: tx,
	})
	.await
	.unwrap();
	rx.await.unwrap()
	}

	/// Send a `Listen` message to the tracker.
	pub async fn listen(&mut self, public_key: C) -> Option<Reservation<E, C>> {
	let (tx, rx) = oneshot::channel();
	self.sender
	.send(Message::Listen {
	public_key,
	reservation: tx,
	})
	.await
	.unwrap();
	rx.await.unwrap()
	}
	}

	// If we have already moved to another view, drop the response as we will
	// not broadcast it
	if self.view != context.view {
	debug!(view = context.view, our_view = self.view, reason = "no longer in required view", "dropping requested proposal");
	continue;
	}

	msg_control = self.control.next() => {
	let msg = match msg_control {
	Some(msg_control) => msg_control,
	None => return Err(Error::PeerDisconnected),
	};
	let (metric, payload) = match msg {
	Message::BitVec(bit_vec) =>
	(metrics::Message::new_bit_vec(&peer), types::Payload::BitVec(bit_vec)),
	Message::Peers(peers) =>
	(metrics::Message::new_peers(&peer), types::Payload::Peers(peers)),
	Message::Kill => {
	return Err(Error::PeerKilled(peer.to_string()))
	}
	};
	Self::send(&mut conn_sender, &self.sent_messages, metric, payload)
	.await?;
	},
	msg_high = self.high.next() => {
	let msg = Self::validate_msg(msg_high, &rate_limits)?;
	Self::send(&mut conn_sender, &self.sent_messages, metrics::Message::new_data(&peer, msg.channel), types::Payload::Data(msg))
	.await?;
	},
	msg_low = self.low.next() => {
	let msg = Self::validate_msg(msg_low, &rate_limits)?;
	Self::send(&mut conn_sender, &self.sent_messages, metrics::Message::new_data(&peer, msg.channel), types::Payload::Data(msg))
	.await?;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[actors] Document and Refine Invariants #1110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[actors] Document and Refine Invariants #1110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions