Skip to content

MQE: skip decoding histogram buckets where possible #11587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Jun 2, 2025

What this PR does

This PR implements an optimisation inspired by prometheus/prometheus#14097.

I've implemented this as a MQE optimisation pass (rather than reusing the logic from that Prometheus PR) for two reasons:

  • It helps prove out the structure of optimisation passes, and demonstrate they're applicable to more than just common subexpression elimination.
  • We need to consider how this optimisation interacts with common subexpression elimination. Specifically, we need to consider all the different paths to a selector after common subexpression elimination has been performed to know if we can skip decoding histogram buckets. Alternatively, if we were to reuse the logic from Prometheus, the common subexpression elimination pass would need to know how to merge selectors that are identical save for the SkipHistogramBuckets hint.

Peak memory consumption is reduced by ~10% in our benchmarks. Latency improves by up to 25% in some cases, but regresses by a small amount (in both absolute and relative terms) for some instant queries:

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks
cpu: Apple M1 Pro
                                                                                                            │ original.txt │             final.txt              │
                                                                                                            │    sec/op    │   sec/op     vs base               │
Query/histogram_count(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                      244.8µ ± 4%   236.4µ ± 1%   -3.40% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10         291.7µ ± 4%   261.4µ ± 2%  -10.40% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10        791.2µ ± 1%   627.4µ ± 2%  -20.70% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                    7.719m ± 1%   7.948m ± 0%   +2.97% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10       11.85m ± 1%   10.64m ± 1%  -10.25% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      54.10m ± 3%   41.61m ± 2%  -23.08% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                   146.2m ± 5%   150.3m ± 0%        ~ (p=0.065 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10      223.9m ± 3%   202.3m ± 0%   -9.63% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10    1047.5m ± 3%   789.9m ± 1%  -24.60% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                     264.6µ ± 4%   243.7µ ± 1%   -7.92% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10        359.7µ ± 4%   320.3µ ± 2%  -10.96% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10       1.295m ± 3%   1.089m ± 0%  -15.92% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                   8.048m ± 1%   8.472m ± 1%   +5.26% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10      16.53m ± 2%   15.48m ± 4%   -6.34% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10    100.81m ± 1%   84.79m ± 0%  -15.89% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                  151.2m ± 0%   160.0m ± 0%   +5.83% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10     314.8m ± 0%   295.9m ± 1%   -6.00% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10     1.968 ± 1%    1.660 ± 0%  -15.64% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                        244.1µ ± 2%   230.4µ ± 2%   -5.60% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10           289.1µ ± 2%   261.0µ ± 1%   -9.70% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10          783.2µ ± 1%   623.1µ ± 0%  -20.44% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                      7.820m ± 2%   7.937m ± 1%   +1.49% (p=0.041 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10         11.95m ± 1%   10.57m ± 1%  -11.52% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10        54.67m ± 1%   41.65m ± 0%  -23.83% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                     146.8m ± 1%   150.1m ± 0%   +2.27% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10        224.9m ± 0%   200.4m ± 1%  -10.90% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      1050.1m ± 1%   790.5m ± 1%  -24.72% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                       258.7µ ± 2%   243.4µ ± 2%   -5.91% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10          353.7µ ± 3%   317.4µ ± 2%  -10.27% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10         1.290m ± 1%   1.084m ± 1%  -15.93% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                     8.085m ± 0%   8.471m ± 1%   +4.78% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10        16.58m ± 6%   15.49m ± 1%   -6.59% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      101.78m ± 3%   84.98m ± 2%  -16.50% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                    152.6m ± 2%   160.2m ± 2%   +4.99% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10       317.7m ± 1%   296.4m ± 1%   -6.70% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10       1.980 ± 1%    1.654 ± 0%  -16.49% (p=0.002 n=6)
geomean                                                                                                        15.01m        13.55m        -9.76%

                                                                                                            │ original.txt │              final.txt              │
                                                                                                            │      B       │      B        vs base               │
Query/histogram_count(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                     76.69Mi ± 5%   68.97Mi ± 5%  -10.07% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10        74.16Mi ± 2%   66.49Mi ± 1%  -10.34% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10       73.48Mi ± 1%   64.20Mi ± 2%  -12.63% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                   71.55Mi ± 1%   62.59Mi ± 3%  -12.52% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10      69.62Mi ± 1%   62.04Mi ± 2%  -10.89% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10     71.49Mi ± 1%   63.12Mi ± 1%  -11.71% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                  75.71Mi ± 2%   67.80Mi ± 2%  -10.44% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10     75.67Mi ± 2%   67.84Mi ± 2%  -10.36% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10    79.64Mi ± 1%   72.62Mi ± 1%   -8.81% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                    75.88Mi ± 2%   67.17Mi ± 2%  -11.48% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10       73.75Mi ± 2%   66.02Mi ± 4%  -10.49% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      73.00Mi ± 1%   64.12Mi ± 2%  -12.16% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                  70.35Mi ± 2%   63.04Mi ± 2%  -10.39% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10     69.81Mi ± 1%   61.73Mi ± 1%  -11.57% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10    71.08Mi ± 1%   62.70Mi ± 1%  -11.78% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                 74.77Mi ± 3%   67.80Mi ± 3%   -9.32% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10    75.47Mi ± 1%   66.90Mi ± 2%  -11.36% (p=0.002 n=6)
Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10   79.35Mi ± 1%   70.52Mi ± 2%  -11.14% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                       76.41Mi ± 2%   69.45Mi ± 4%   -9.10% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10          75.35Mi ± 3%   66.16Mi ± 2%  -12.19% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10         72.92Mi ± 1%   63.72Mi ± 1%  -12.62% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                     71.88Mi ± 2%   62.25Mi ± 1%  -13.40% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10        70.02Mi ± 1%   61.74Mi ± 1%  -11.83% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10       71.70Mi ± 1%   63.26Mi ± 1%  -11.78% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10                    74.36Mi ± 2%   66.23Mi ± 2%  -10.94% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10       75.64Mi ± 1%   67.31Mi ± 1%  -11.01% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      79.44Mi ± 1%   72.38Mi ± 1%   -8.89% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                      75.46Mi ± 4%   67.14Mi ± 2%  -11.03% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10         73.41Mi ± 3%   65.93Mi ± 3%  -10.19% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10        72.95Mi ± 1%   63.78Mi ± 1%  -12.57% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                    71.73Mi ± 2%   63.17Mi ± 2%  -11.93% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10       69.86Mi ± 1%   61.66Mi ± 1%  -11.73% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10      71.34Mi ± 1%   62.35Mi ± 2%  -12.60% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10                   75.13Mi ± 1%   67.05Mi ± 3%  -10.75% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10      75.03Mi ± 1%   67.27Mi ± 2%  -10.35% (p=0.002 n=6)
Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10     78.70Mi ± 2%   70.84Mi ± 1%   -9.98% (p=0.002 n=6)
geomean                                                                                                       73.92Mi        65.69Mi       -11.13%

⚠️ This PR is pending prometheus/prometheus#16686 being merged and vendored into Mimir.

Which issue(s) this PR fixes or relates to

#10067

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

Copy link
Contributor

github-actions bot commented Jun 2, 2025

@charleskorn
Copy link
Contributor Author

This PR is pending prometheus/prometheus#16686 being merged and vendored into Mimir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant