Description
As write throughput is bound by disk IO, compressing events during serialization could improve throughput at the cost of CPU (see: proof of concept).
If possible, per-event compression should be delivered inside the scope of existing v2 PQ page format, in which entries contain only seqnum
+length
+N bytes
. To do this, the reader will need to be able to handle compressed or uncompressed bytes without additional context (e.g., by differentiating zlib header from existing CBOR first-bytes).
Because not all users will want to spend CPU for increased throughput, and because of later-mentioned rollback barriers, this feature should first be delivered as opt-in, preferably at a per-pipeline level.
Compatibility Considerations
Once a queue contains compressed events, it will be unable to be read by a logstash instance that does not support event decompression; this presents an undesired rollback barrier that would prevent a user from rolling back to a last known-working configuration due to an unrelated issue.
Queue compression should be implemented as opt-in until at least three minor versions have shipped with decompression support.