Skip to content

Clarify & simplify quic traits #177

Open
@FlorianUekermann

Description

@FlorianUekermann

Big picture problem

I believe there are a few issues with the h3::quic::Connection trait, which lead to expected behavior being under-defined and some hidden bugs in h3, which don't surface in tests due to h3-quinn implementation decisions. The topic easily gets out of hand, because the space of possible interpretations of "correct", as well as the solution space is large, so I'll break it up a bit and leave some stuff out.

Example

Possibly a bug, subject to interpretation:

h3/h3/src/connection.rs

Lines 244 to 257 in da29aea

loop {
match self.conn.poll_accept_recv(cx)? {
Poll::Ready(Some(stream)) => self
.pending_recv_streams
.push(AcceptRecvStream::new(stream)),
Poll::Ready(None) => {
return Poll::Ready(Err(Code::H3_GENERAL_PROTOCOL_ERROR.with_reason(
"Connection closed unexpected",
crate::error::ErrorLevel::ConnectionError,
)))
}
Poll::Pending => break,
}
}

The Poll::Ready(None) arm implies that a connection closure is not expected when this is called. Yet, this snippet is actually almost certainly where connection close is observed for the first time. But h3-quinn treats any close as an error, even non-error ones. Therefore the arm that is selected is actually the Poll::Ready(Err(_)) inside the ? in L245, which is why no tests have caught this.

If you use other quic implementations, which return None on non-error closes, graceful shutdown tests start failing. I think this is a tip-of-the-iceberg situation, because I observe other shutdown related oddities, depending on similar details, which I don't want to get into here for brevity.
Even if you do know what h3 actually expects, I believe there are race conditions, for example: Observing control stream reset is never expected , but implied by a quic connection close.

Underlying problem

These methods are the meat of the quic traits.

h3/h3/src/quic.rs

Lines 45 to 71 in da29aea

/// Accept an incoming unidirectional stream
///
/// Returning `None` implies the connection is closing or closed.
fn poll_accept_recv(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Option<Self::RecvStream>, Self::Error>>;
/// Accept an incoming bidirectional stream
///
/// Returning `None` implies the connection is closing or closed.
fn poll_accept_bidi(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Option<Self::BidiStream>, Self::Error>>;
/// Poll the connection to create a new bidirectional stream.
fn poll_open_bidi(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Self::BidiStream, Self::Error>>;
/// Poll the connection to create a new unidirectional stream.
fn poll_open_send(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Self::SendStream, Self::Error>>;

Especially taking into account the comments explicitly stating that None is a way to indicate a close. There is a good amount of ambiguity in where connection closes should be communicated first and how. From a naive reading, even a first error surfacing via a RecvStream method seems reasonable, growing the list of affected methods even more.
With the current implementation, there doesn't seem to be a clear distinction between non-error closes and other terminations in the quic traits, which should be handled differently.
I think the example demonstrates that these issues not only make life hard for implementers of the traits, but also for contributors to h3.

Small detour to increase the solution space

Neither Quic nor Http3 provide a way to accept/reject opening streams/requests by the remote. In Http3 the stream can be closed and stopped immediately with an appropriate error code to indicate rejection after the fact, but opening a stream/request is a unilateral decision by the remote.
As a result, the differentiated h3::quic::Connection::poll_accept_* methods aren't particularly useful. At least with quinn, calling "accept" has no stream concurrency implications. Other Quic implementations may choose a different approach, but without more explicit ways to apply backpressure on remote stream creation I'm not sure that would even be desirable.

As a result, it would be an option to merge the h3::quic::Connection::poll_accept_* methods to something like this:

fn poll(&mut self) -> Poll<Result<RecvOrBidiStream, Error>>

The Result vs Option<Result> and Error type considerations aren't the point here, see next section for that stuff.

Solutions

As mentioned above the solution space is large, but there are some key choices:

A. Connection trait complexity

  1. Keep the current or a similar Connection trait and exhaustively document the contract (expectations on how and in which order closes become visible on different methods and promises what is polled first and under which conditions).
    • Cons: Test coverage will always be problematic and the contract is surprisingly complex.
    • Pros: Unclear to me.
  2. Reduce the Connection trait to a single poll (plus poll_open_*), which emits streams, as well as connection errors and close information. This should either:
    1. not include an Option, but require explicit communication of a final close or error and return Poll::Pending instead of None: fn poll(&mut self) -> Poll<Result<RecvOrBidiStream, Close>>
    2. include only an option: fn poll(&mut self) -> Poll<Option<RecvOrBidiStream>>, with None implying connection termination, and an extra fn closed(&mut self) -> Option<Close>.
    • Cons: Giving up hypothetical implicit stream concurrency control, if a Quic implementation chooses to delay increasing remote stream opening budgets via MAX_STREAMS frames until the application layer calls "poll_accept_*". As mentioned earlier, I think adding explicit methods for this would be more appropriate if this is ever desired.
    • Pros: A single place for connection errors and closes to surface.

B. Close type
Above a Close type is mentioned. Currently the equivalent is Box<dyn Error> and maybe Option::None. Note: Application closes are an expected way to lose connection, even if no GOAWAY was sent. Other closes are not and may indicate a problem that the application needs to report or react to. Therefore they should be clearly distinguishable. Options:

  1. Keep the simple Error trait and maybe expand the required methods a bit more to distinguish application closes.
  2. A Close enum, either custom or just Result<ApplicationClose, ConnectionError>. ConnectionError would be constrained like Error is atm. I'm using this on the quic side and it is very nice. The content of ApplicationClose is fairly well defined by the Quic RFC. Benefit: Meaning is obvious to the user if propagated, and obvious to the implementer of the quic side.

C. Stream errors
It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.

Additionally, the RFC does not require any stream error to have connection level effects, but allows implementations to choose to handle them that way. Again, I don't think that would help simplicity.

E. Stream order
The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.

E. Redundancy
There's a bit of redundancy in the OpenStreams and Connection trait, which is unnecessary and has a few awkward side-effects. #173 deals with that. It is largely orthogonal to this topic, but would be nice to get out of the way, because it makes (Draft) PRs for this issue a bit easier to read.

Conclusion

I hope the above makes sense. And I hope my concerns aren't based on a series of misunderstandings on my side. I think there's potential for h3 code to become simpler and easier to reason about. A draft PR should be fairly straightforward and I'm happy to give it a shot. But with a change that is this substantial, I wanted to explain my perspective a bit and check if you are open to such changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-traitArea: quic trait abstractionB-rfcBlocked: request for comments. Needs more discussion.C-refactorCategory: refactor. This would improve the clarity of internal code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions