-
Notifications
You must be signed in to change notification settings - Fork 595
MQE: eager loading of selectors #11624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good with a few suggestions. Mostly I'm just struggling to understand the context of the change and when it's going to be required or not required (required for #11417 but not required once querysharding is incorporated into MQE?)
@@ -18,6 +18,12 @@ type EngineOpts struct { | |||
// Should only be used in tests. | |||
Pedantic bool `yaml:"-"` | |||
|
|||
// Prometheus' engine evaluates all selectors (ie. calls Querier.Select()) before evaluating any part of the query. | |||
// We rely on this behavior in query-frontends when evaluating shardable queries so that all selectors are evaluated in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to understand what parts of this change with the current architecture (non-MQE query-frontend) are required and how it will change the behavior. Is the purpose of this option to make MQE act like the Prometheus engine within the query-frontend? I'm missing which parts are going to be executed in parallel and where (query-frontend? queriers?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus' engine makes all of the Querier.Select
calls for a query upfront, before evaluating any part of the query.
MQE makes the Querier.Select
call for a selector when SeriesMetadata
is called on the selector operator. In practice, this usually means all Querier.Select
calls are done upfront before any evaluation occurs, but there are some exceptions (eg. queries that include absent()
and absent_over_time()
). Crucially, MQE iterates through the returned series set in Selector.SeriesMetadata
so that it can return the list of series the selector will return.
In queriers, Mimir provides the engine an ordinary Queryable
, so this means that queries issue Select
calls in series regardless of which engine is used.
In query-frontends, when running a sharded query or a query with subquery spin-off, Mimir provides a LazyQueryable
to the engine, which returns a LazyQuerier
. LazyQuerier.Select()
starts a background goroutine to evaluate the selector and returns a lazy series set immediately. This lazy series set only blocks on receiving the result when the engine starts iterating through it.
When using Prometheus' engine in query-frontends, this allows the different selectors to be evaluated concurrently, and the query doesn't block waiting for a selector to be evaluated until it's required by the engine. This means that different sharded legs of a query are evaluated in parallel, for example.
However, when using MQE in query-frontends, because Selector.SeriesMetadata
iterates through the series set immediately after calling Select
, the call blocks immediately and no other Select
calls are issued until the current one completes.
So the goal of this PR is to have a way for MQE to make all Select
calls for a query before iterating over any series set, so that a query with many sharded legs can still have each leg evaluated in parallel.
Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thank you.
pkg/streamingpromql/engine_test.go
Outdated
testutils.RequireEqualResults(t, expr, baselineResult, eagerLoadingResult, false) | ||
|
||
// Each Select call through our mocked slow storage takes at least 2s, so if they aren't run in parallel, then the query will take over 4s to run. | ||
require.Less(t, duration, 4*time.Second, "expected selectors to be evaluated in parallel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good suggestion for testing this but I really don't want to add a unit test that relies on timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't love this either - the other option I considered was some logic in slowQuerier
to have the two Select
calls wait for the other to be running before continuing, but I worried that would be more difficult to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock but I'd like to try using some form of synchronization to avoid relying on timing in tests.
pkg/streamingpromql/engine_test.go
Outdated
testutils.RequireEqualResults(t, expr, baselineResult, eagerLoadingResult, false) | ||
|
||
// Each Select call through our mocked slow storage takes at least 2s, so if they aren't run in parallel, then the query will take over 4s to run. | ||
require.Less(t, duration, 4*time.Second, "expected selectors to be evaluated in parallel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be preferable. I'll see if I can come up with something latch-like in a separate branch.
I opened #11660 which I think demonstrates how to do this. I confirmed it is testing the right thing by removing the lazy-loading wrapper from the |
Nice, thanks for the suggestion. I've incorporated that into e204ce2. |
# Conflicts: # pkg/streamingpromql/operators/selectors/instant_vector_selector.go
What this PR does
This PR introduces the ability to eagerly load selectors in MQE.
When combined with
LazyQueryable
, this means that selectors are evaluated in parallel in the background while the query is evaluated. This is important for using MQE in query-frontends, as this allows each part of a shardable query to be evaluated in parallel, rather than serially.Which issue(s) this PR fixes or relates to
#10067
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.