Skip to content

PQ could benefit from batches spanning pages #17826

Open
@yaauie

Description

@yaauie

In the PQ, batches are unbroken sequences of events, and are currently constrained to originate from a single page, in part to facilitate the ACK-ing of the batch's range without needing to ACK each individual event.

This limitation can cause undersized batches. In an extreme example, a pipeline configured with the default page size of 64MiB, processing events that are 500KiB in size, will emit one 125-event batch (~61MiB) and one ~6-event batch (~3MiB; the rest of the page).

In theory, a batch could be made to safely span multiple pages, while maintaining the unbroken sequential ordering guarantees of the existing implementation.

Implementation Notes

To do this I would extract an interface from the existing Batch, rename the current single-page-origin implementation something like BatchSegment, and add a new implementation that composes multiple BatchSegments in-order and delegates to their methods as necessary.

The reader would then need to be modified to continue across page boundaries, and to return enough information that the resulting composite batch could track its multiple segments. The existing synchronization that ensures at-most-one worker is contending for the read lock will also ensure that the batch as-composed will remain an unbroken sequence.

This may dove-tail with the refactoring required for #17821

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions