-
Notifications
You must be signed in to change notification settings - Fork 301
Open
Description
This issue is to inform anyone experiencing deadlocks or hangs related to re_upload requests what the issue is, and how it can be fixed. Since buck2 uses tonic and h2, and by default, allows 1024 concurrent requests to the CAS, it is likely to trigger this bug in the h2 crate.
- With many actions that require uploads, they go through an
re_upload
phase where they call find_missing_blobs followed by uploads. - Buck2 has a
cas_semaphore
that allows for up to 1024 actions to enter the upload phase at a time. - This creates up to 1024 HTTP2 streams in a single connection to the CAS grpc server.
- HTTP2 servers have a limit on the maximum number of concurrent streams. If the client attempts to create more streams than the remote server allows, these streams get created locally only, and put into a state called pending_open, and accept locally buffered data.
- In HTTP2, each individual stream has a window, and the connection as a whole has a window, of the amount of unacknowledged data that can be sent. In H2, connection-level window capacity is assigned to streams that have data ready to send.
- Due to the above bug in H2, what happens is that connection-level window capacity gets assigned to streams that are still in the pending_open state.
- Once all of your connection level capacity is given to streams in the pending_open state, nothing more gets sent, and the connection stalls forever.
vaibhav-shah
Metadata
Metadata
Assignees
Labels
No labels