Add batched streaming aggregations #324
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With the current model we expect each Next call to return samples for unique steps. This approach works well because of its simplicity, but for high cardinality queries (100K+ series), it tends to use a lot of memory because the buffers for each step tend to be big.
This commit resolves that by allowing the aggregate to handle batches from the same step coming from subsequent Next calls. Selectors are expanded with a batchSize parameter which can be injected when a streaming aggregate is present in the plan. Using this parameter then can put an upper limit on the size of the output vectors they produce.
This is a before and after of the total heap size of all queriers from a 1M series query. The green line indicates the total heap size (
sum(go_memstats_heap_inuse_bytes)) for all queriers in the query path when executing the query with this change. The yellow line is the total memory used by all queriers in the query path using the main branch of the engine.There is approximately a 20% reduction in heap size because vector batches from the vector selector are capped to 32K instead of being unbounded as they are on main.