-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Query Resource Isolation] Additonal Sampling for Broker and Server #16164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #16164 +/- ##
============================================
+ Coverage 62.90% 63.30% +0.40%
+ Complexity 1386 1363 -23
============================================
Files 2867 2976 +109
Lines 163354 172937 +9583
Branches 24952 26497 +1545
============================================
+ Hits 102755 109485 +6730
- Misses 52847 55083 +2236
- Partials 7752 8369 +617
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchaganl In our deployment we noticed a few other misses (eg: Distinct Queries, PercentileTDigest). Can we make sure to add sampling there too?
@@ -652,6 +654,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn | |||
long routingEndTimeNs = System.nanoTime(); | |||
_brokerMetrics.addPhaseTiming(rawTableName, BrokerQueryPhase.QUERY_ROUTING, | |||
routingEndTimeNs - routingStartTimeNs); | |||
// Account the resource used for routing phase | |||
Tracing.ThreadAccountantOps.sample(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a sample at the end of every phase:
- Request Compilation
- Authorization
- Routing
Also add a comment here as to why sampling for routing is important - with single threaded segment pruners, this can take resources when there are a lot of segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added for Request Compilation & Routing. I did consider for authorization and looked into our latency metric for authorization it is less than 0.1 ms . So avoided it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Prefer moving the sampling to the location where we compute phase timing for each phase. That way it's evident that we want to sample each broker phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Routing was closer to the phase calculation moved compilation as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add authorization. The idea is that if there's a bug/deployment issue in this phase, we catch it.
@@ -371,6 +371,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn | |||
CompileResult compileResult = | |||
compileRequest(requestId, query, sqlNodeAndOptions, request, requesterIdentity, requestContext, httpHeaders, | |||
accessControl); | |||
// Accounts for resource usage of the compilation phase | |||
Tracing.ThreadAccountantOps.sample(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also make this change for the other handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is added for SSE and MSE.
GrpcBrokerRequestHandler
, SingleConnectionBrokerRequestHandler
extend this handler for SSE. TimeSeriesRequestHandler
seems to be custom handler.
For MSE it is added in QueryEnvironment
@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) { | |||
@Override | |||
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) { | |||
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment()); | |||
// Sample to track usage of query planning | |||
Tracing.ThreadAccountantOps.sample(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use sampleAndCheckInterruption() in all these places to make sure the thread exists if it's already been interrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the usage of sampleAndCheckInterruption() mostly on broker on merge. I taught there was reason for doing that specifically on broker.
Is sampleAndCheckInterruption() a more evolved approach we want to move on all places on server as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should see it used on server as well in some of the operators. The only difference is one blindly samples. While the other checks for interruption and throws exception before sampling.
@@ -273,6 +274,8 @@ public CompiledQuery compile(String sqlQuery, SqlNodeAndOptions sqlNodeAndOption | |||
queryNode = sqlNode; | |||
} | |||
RelRoot relRoot = compileQuery(queryNode, plannerContext); | |||
// Accounts for resource usage of the compilation phase | |||
Tracing.ThreadAccountantOps.sampleMSE(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setupRunner is called much later before dispatch in MultistageBrokerRequestHandler. Two questions here:
- It will not work for multistage handlers.
- If this is for the single stage handler fallback path, is this really necessary if we already are sampling after the compilation phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You are right it was setup after compilation, moved it up, take a look if looks right?
- This path is touched by MSE handler so it is not added as a fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look right to me. The QueryEnv.compile() happens in an async call with a separate threadpool. We might have to find another way to account for these async calls in MSE.
@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) { | |||
@Override | |||
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) { | |||
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment()); | |||
// Sample to track usage of query planning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to track 2 things on the server right?
- Query Planning
- Segment Pruning
Can you make sure both are tracked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for Segment pruning
@@ -298,7 +298,6 @@ protected BrokerResponse handleRequestThrowing(long requestId, String query, Sql | |||
|
|||
try (QueryEnvironment.CompiledQuery compiledQuery = | |||
compileQuery(requestId, query, sqlNodeAndOptions, httpHeaders, queryTimer)) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) remove spurious changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because of a bad merge with master, fixed it
@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) { | |||
@Override | |||
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) { | |||
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment()); | |||
// Sample to track usage of query planning, since it can be expensive for large segment lists. | |||
Tracing.ThreadAccountantOps.sampleAndCheckInterruption(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep all the sampling in a single place without moving it inside the planning/pruning classes.. That way it's easier to reason about it.
For example: can we add both the pruning and planning sampling in ServerQueryExecutorV1Impl.
- Pruning sampling in
executeInternal()
after the selectedSegmentsInfo is computed. - Planning in
executeQuery
before theplanCombineQuery
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that consolidating the sampling into a single location can improve readability, but in this case it would come at the cost of precision—because you’d only capture metrics once, after the entire operation completes. For example, we have several pruner and ideally need to record a sample after each one to maintain accuracy (and the same applies to planning).
Summary
Query-Resource-Isolation (QRI) relies on periodic CPU / memory samples to detect “run-away” queries and enforce per-workload budgets.
In several incidents we found that no samples are collected during three cost-intensive phases, so large queries sometimes survive far longer than the budget window:
This PR addresses those gaps by introducing lightweight sampling hooks at the end of each phase on the Broker side.
On the Server, sampling is added at the per-segment plan level.
These hooks are enabled by default since the overhead is minimal:
• On the Broker, sampling incurs only a single MXBean call.
• On the Server, sampling is bounded by the number of segments planned. For valid queries, P95th of queries typically involves fewer than 100 segments per server, with a P50 around 45 segments.
Testing Done
Manual quick start and integ test to validate if sampling is happening during this path.