Skip to content

[Query Resource Isolation] Additonal Sampling for Broker and Server #16164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

praveenc7
Copy link
Contributor

@praveenc7 praveenc7 commented Jun 20, 2025

Summary

Query-Resource-Isolation (QRI) relies on periodic CPU / memory samples to detect “run-away” queries and enforce per-workload budgets.
In several incidents we found that no samples are collected during three cost-intensive phases, so large queries sometimes survive far longer than the budget window:

Component Phase currently unsampled Consequence
Broker SQL compilation Big plans compile for > 1 s with 0 ns billed
Broker Routing table build Large fan-out hits un-metered
Server Plan building Segment-level planning can take seconds on wide tables

This PR addresses those gaps by introducing lightweight sampling hooks at the end of each phase on the Broker side.

On the Server, sampling is added at the per-segment plan level.

These hooks are enabled by default since the overhead is minimal:
• On the Broker, sampling incurs only a single MXBean call.
• On the Server, sampling is bounded by the number of segments planned. For valid queries, P95th of queries typically involves fewer than 100 segments per server, with a P50 around 45 segments.

Testing Done

Manual quick start and integ test to validate if sampling is happening during this path.

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.30%. Comparing base (1a476de) to head (76ac953).
Report is 489 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16164      +/-   ##
============================================
+ Coverage     62.90%   63.30%   +0.40%     
+ Complexity     1386     1363      -23     
============================================
  Files          2867     2976     +109     
  Lines        163354   172937    +9583     
  Branches      24952    26497    +1545     
============================================
+ Hits         102755   109485    +6730     
- Misses        52847    55083    +2236     
- Partials       7752     8369     +617     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.28% <100.00%> (+0.41%) ⬆️
java-21 63.28% <100.00%> (+0.46%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.30% <100.00%> (+0.40%) ⬆️
unittests 63.30% <100.00%> (+0.40%) ⬆️
unittests1 56.45% <100.00%> (+0.62%) ⬆️
unittests2 33.28% <60.00%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@praveenc7 praveenc7 marked this pull request as ready for review July 14, 2025 17:40
Copy link
Contributor

@vvivekiyer vvivekiyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchaganl In our deployment we noticed a few other misses (eg: Distinct Queries, PercentileTDigest). Can we make sure to add sampling there too?

@@ -652,6 +654,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn
long routingEndTimeNs = System.nanoTime();
_brokerMetrics.addPhaseTiming(rawTableName, BrokerQueryPhase.QUERY_ROUTING,
routingEndTimeNs - routingStartTimeNs);
// Account the resource used for routing phase
Tracing.ThreadAccountantOps.sample();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a sample at the end of every phase:

  1. Request Compilation
  2. Authorization
  3. Routing

Also add a comment here as to why sampling for routing is important - with single threaded segment pruners, this can take resources when there are a lot of segments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added for Request Compilation & Routing. I did consider for authorization and looked into our latency metric for authorization it is less than 0.1 ms . So avoided it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Prefer moving the sampling to the location where we compute phase timing for each phase. That way it's evident that we want to sample each broker phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing was closer to the phase calculation moved compilation as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add authorization. The idea is that if there's a bug/deployment issue in this phase, we catch it.

@@ -371,6 +371,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn
CompileResult compileResult =
compileRequest(requestId, query, sqlNodeAndOptions, request, requesterIdentity, requestContext, httpHeaders,
accessControl);
// Accounts for resource usage of the compilation phase
Tracing.ThreadAccountantOps.sample();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also make this change for the other handlers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is added for SSE and MSE.

GrpcBrokerRequestHandler, SingleConnectionBrokerRequestHandler extend this handler for SSE. TimeSeriesRequestHandler seems to be custom handler.

For MSE it is added in QueryEnvironment

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
@Override
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
// Sample to track usage of query planning
Tracing.ThreadAccountantOps.sample();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use sampleAndCheckInterruption() in all these places to make sure the thread exists if it's already been interrupted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the usage of sampleAndCheckInterruption() mostly on broker on merge. I taught there was reason for doing that specifically on broker.

Is sampleAndCheckInterruption() a more evolved approach we want to move on all places on server as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should see it used on server as well in some of the operators. The only difference is one blindly samples. While the other checks for interruption and throws exception before sampling.

@@ -273,6 +274,8 @@ public CompiledQuery compile(String sqlQuery, SqlNodeAndOptions sqlNodeAndOption
queryNode = sqlNode;
}
RelRoot relRoot = compileQuery(queryNode, plannerContext);
// Accounts for resource usage of the compilation phase
Tracing.ThreadAccountantOps.sampleMSE();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setupRunner is called much later before dispatch in MultistageBrokerRequestHandler. Two questions here:

  1. It will not work for multistage handlers.
  2. If this is for the single stage handler fallback path, is this really necessary if we already are sampling after the compilation phase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You are right it was setup after compilation, moved it up, take a look if looks right?
  2. This path is touched by MSE handler so it is not added as a fallback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look right to me. The QueryEnv.compile() happens in an async call with a separate threadpool. We might have to find another way to account for these async calls in MSE.

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
@Override
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
// Sample to track usage of query planning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to track 2 things on the server right?

  1. Query Planning
  2. Segment Pruning
    Can you make sure both are tracked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for Segment pruning

@@ -298,7 +298,6 @@ protected BrokerResponse handleRequestThrowing(long requestId, String query, Sql

try (QueryEnvironment.CompiledQuery compiledQuery =
compileQuery(requestId, query, sqlNodeAndOptions, httpHeaders, queryTimer)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) remove spurious changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because of a bad merge with master, fixed it

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
@Override
public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
// Sample to track usage of query planning, since it can be expensive for large segment lists.
Tracing.ThreadAccountantOps.sampleAndCheckInterruption();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep all the sampling in a single place without moving it inside the planning/pruning classes.. That way it's easier to reason about it.

For example: can we add both the pruning and planning sampling in ServerQueryExecutorV1Impl.

  • Pruning sampling in executeInternal() after the selectedSegmentsInfo is computed.
  • Planning in executeQuery before the planCombineQuery is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that consolidating the sampling into a single location can improve readability, but in this case it would come at the cost of precision—because you’d only capture metrics once, after the entire operation completes. For example, we have several pruner and ideally need to record a sample after each one to maintain accuracy (and the same applies to planning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants