MQE: eager loading of selectors #11624

charleskorn · 2025-06-04T05:17:46Z

What this PR does

This PR introduces the ability to eagerly load selectors in MQE.

When combined with LazyQueryable, this means that selectors are evaluated in parallel in the background while the query is evaluated. This is important for using MQE in query-frontends, as this allows each part of a shardable query to be evaluated in parallel, rather than serially.

Which issue(s) this PR fixes or relates to

#10067

Checklist

Tests updated.
[n/a] Documentation added.
[covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
[n/a] about-versioning.md updated with experimental features.

56quarters

This looks good with a few suggestions. Mostly I'm just struggling to understand the context of the change and when it's going to be required or not required (required for #11417 but not required once querysharding is incorporated into MQE?)

56quarters · 2025-06-05T15:53:11Z

pkg/streamingpromql/config.go

@@ -18,6 +18,12 @@ type EngineOpts struct {
 	// Should only be used in tests.
 	Pedantic bool `yaml:"-"`

+	// Prometheus' engine evaluates all selectors (ie. calls Querier.Select()) before evaluating any part of the query.
+	// We rely on this behavior in query-frontends when evaluating shardable queries so that all selectors are evaluated in parallel.


I'm struggling to understand what parts of this change with the current architecture (non-MQE query-frontend) are required and how it will change the behavior. Is the purpose of this option to make MQE act like the Prometheus engine within the query-frontend? I'm missing which parts are going to be executed in parallel and where (query-frontend? queriers?).

Prometheus' engine makes all of the Querier.Select calls for a query upfront, before evaluating any part of the query.

MQE makes the Querier.Select call for a selector when SeriesMetadata is called on the selector operator. In practice, this usually means all Querier.Select calls are done upfront before any evaluation occurs, but there are some exceptions (eg. queries that include absent() and absent_over_time()). Crucially, MQE iterates through the returned series set in Selector.SeriesMetadata so that it can return the list of series the selector will return.

In queriers, Mimir provides the engine an ordinary Queryable, so this means that queries issue Select calls in series regardless of which engine is used.

In query-frontends, when running a sharded query or a query with subquery spin-off, Mimir provides a LazyQueryable to the engine, which returns a LazyQuerier. LazyQuerier.Select() starts a background goroutine to evaluate the selector and returns a lazy series set immediately. This lazy series set only blocks on receiving the result when the engine starts iterating through it.

When using Prometheus' engine in query-frontends, this allows the different selectors to be evaluated concurrently, and the query doesn't block waiting for a selector to be evaluated until it's required by the engine. This means that different sharded legs of a query are evaluated in parallel, for example.

However, when using MQE in query-frontends, because Selector.SeriesMetadata iterates through the series set immediately after calling Select, the call blocks immediately and no other Select calls are issued until the current one completes.

So the goal of this PR is to have a way for MQE to make all Select calls for a query before iterating over any series set, so that a query with many sharded legs can still have each leg evaluated in parallel.

Does that answer your question?

yes, thank you.

56quarters · 2025-06-05T15:56:11Z

pkg/streamingpromql/engine_test.go

+			testutils.RequireEqualResults(t, expr, baselineResult, eagerLoadingResult, false)
+
+			// Each Select call through our mocked slow storage takes at least 2s, so if they aren't run in parallel, then the query will take over 4s to run.
+			require.Less(t, duration, 4*time.Second, "expected selectors to be evaluated in parallel")


I don't have a good suggestion for testing this but I really don't want to add a unit test that relies on timing.

I didn't love this either - the other option I considered was some logic in slowQuerier to have the two Select calls wait for the other to be running before continuing, but I worried that would be more difficult to understand.

I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.

pkg/streamingpromql/engine_test.go

56quarters

Approving to unblock but I'd like to try using some form of synchronization to avoid relying on timing in tests.

56quarters · 2025-06-06T17:02:23Z

pkg/streamingpromql/engine_test.go

+			testutils.RequireEqualResults(t, expr, baselineResult, eagerLoadingResult, false)
+
+			// Each Select call through our mocked slow storage takes at least 2s, so if they aren't run in parallel, then the query will take over 4s to run.
+			require.Less(t, duration, 4*time.Second, "expected selectors to be evaluated in parallel")


I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.

56quarters · 2025-06-06T17:22:38Z

I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.

I opened #11660 which I think demonstrates how to do this. I confirmed it is testing the right thing by removing the lazy-loading wrapper from the Queryable - the test hangs in that case.

charleskorn · 2025-06-10T23:50:27Z

I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.

I opened #11660 which I think demonstrates how to do this. I confirmed it is testing the right thing by removing the lazy-loading wrapper from the Queryable - the test hangs in that case.

Nice, thanks for the suggestion. I've incorporated that into e204ce2.

# Conflicts: # pkg/streamingpromql/operators/selectors/instant_vector_selector.go

Introduce eager loading

9fa9d37

charleskorn mentioned this pull request Jun 4, 2025

Mimir Query Engine #10067

Open

charleskorn marked this pull request as ready for review June 4, 2025 05:41

charleskorn requested a review from a team as a code owner June 4, 2025 05:41

56quarters reviewed Jun 5, 2025

View reviewed changes

Remove TODO

9d6a575

56quarters approved these changes Jun 6, 2025

View reviewed changes

56quarters mentioned this pull request Jun 6, 2025

WIP: Demo of using a latch to avoid timing in tests #11660

Draft

Address PR feedback: use latches to avoid using timing in test

e204ce2

Merge branch 'main' into charleskorn/mqe-eager-loading

573df70

# Conflicts: # pkg/streamingpromql/operators/selectors/instant_vector_selector.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MQE: eager loading of selectors #11624

MQE: eager loading of selectors #11624

charleskorn commented Jun 4, 2025 •

edited

Loading

Uh oh!

56quarters left a comment

Uh oh!

56quarters Jun 5, 2025

Uh oh!

charleskorn Jun 6, 2025

Uh oh!

56quarters Jun 6, 2025

Uh oh!

56quarters Jun 5, 2025

Uh oh!

charleskorn Jun 6, 2025

Uh oh!

56quarters Jun 6, 2025

Uh oh!

Uh oh!

56quarters left a comment

Uh oh!

56quarters Jun 6, 2025

Uh oh!

56quarters commented Jun 6, 2025

Uh oh!

charleskorn commented Jun 10, 2025

Uh oh!

Uh oh!

MQE: eager loading of selectors #11624

Are you sure you want to change the base?

MQE: eager loading of selectors #11624

Conversation

charleskorn commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

56quarters left a comment

Choose a reason for hiding this comment

Uh oh!

56quarters Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

charleskorn Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

56quarters Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

56quarters Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

charleskorn Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

56quarters Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

56quarters left a comment

Choose a reason for hiding this comment

Uh oh!

56quarters Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

56quarters commented Jun 6, 2025

Uh oh!

charleskorn commented Jun 10, 2025

Uh oh!

Uh oh!

charleskorn commented Jun 4, 2025 •

edited

Loading