PQ could benefit from batches spanning pages

In the PQ, batches are unbroken sequences of events, and are currently constrained to originate from a single page, in part to facilitate the ACK-ing of the batch's range without needing to ACK each individual event.

This limitation can cause undersized batches. In an extreme example, a pipeline configured with the default page size of 64MiB, processing events that are 500KiB in size, will emit one 125-event batch (~61MiB) and one ~6-event batch (~3MiB; the rest of the page).

In theory, a batch could be made to safely span multiple pages, while maintaining the unbroken sequential ordering guarantees of the existing implementation. 

## Implementation Notes

To do this I would extract an interface from the existing `Batch`, rename the current single-page-origin implementation something like `BatchSegment`, and add a new implementation that composes multiple `BatchSegment`s in-order and delegates to their methods as necessary.

The reader would then need to be modified to continue across page boundaries, and to return enough information that the resulting composite batch could track its multiple segments. The existing synchronization that ensures at-most-one worker is contending for the read lock will also ensure that the batch as-composed will remain an unbroken sequence.

This may dove-tail with the refactoring required for #17821

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PQ could benefit from batches spanning pages #17826

Implementation Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PQ could benefit from batches spanning pages #17826

Description

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions