Description
Steps batch might be set to a value that is larger than actual number of steps. In this case, it is unnecessary to allocate more than number of required steps and it could cause impact to performance. I see there are a lot of operators that use step batch size instead of capping it with number of steps. For example, https://github.com/thanos-io/promql-engine/blob/main/storage/prometheus/scanners.go#L132.
Example test result. Set steps batch size to default value 10. See field samplesExecutionTime
which is time spent on calling Next
.
curl 'http://localhost:10902/api/v1/query_range?query=sum(up)&start=1748296266.667&end=1748299866.667&step=14&analyze=true&steps_batch_size=10' | jq
"analysis": {
"name": "[concurrent(buff=2)]",
"executionTime": "3.834567372s",
"seriesExecutionTime": "3.683574833s",
"samplesExecutionTime": "150.992539ms",
"peakSamples": 1,
"totalSamples": 5317858,
...
}
Set steps batch size to an insanely high value. Time spent on next was 1.3s. We should try to optimize vector pool to set cap of vector pool size to number of steps instead of using what's specified as the steps batch size
curl 'http://localhost:10902/api/v1/query_range?query=sum(up)&start=1748296266.667&end=1748299866.667&step=14&analyze=true&steps_batch_size=10000000' | jq
"analysis": {
"name": "[concurrent(buff=2)]",
"executionTime": "5.546953168s",
"seriesExecutionTime": "4.241078542s",
"samplesExecutionTime": "1.305874626s",
"peakSamples": 1,
"totalSamples": 5317858,
...
}