[Query Resource Isolation] Additonal Sampling for Broker and Server #16164

praveenc7 · 2025-06-20T16:56:13Z

Summary

Query-Resource-Isolation (QRI) relies on periodic CPU / memory samples to detect “run-away” queries and enforce per-workload budgets.
In several incidents we found that no samples are collected during three cost-intensive phases, so large queries sometimes survive far longer than the budget window:

Component	Phase currently unsampled	Consequence
Broker	SQL compilation	Big plans compile for > 1 s with 0 ns billed
Broker	Routing table build	Large fan-out hits un-metered
Server	Plan building	Segment-level planning can take seconds on wide tables

This PR addresses those gaps by introducing lightweight sampling hooks at the end of each phase on the Broker side.

On the Server, sampling is added at the per-segment plan level.

These hooks are enabled by default since the overhead is minimal:
• On the Broker, sampling incurs only a single MXBean call.
• On the Server, sampling is bounded by the number of segments planned. For valid queries, P95th of queries typically involves fewer than 100 segments per server, with a P50 around 45 segments.

Testing Done

Manual quick start and integ test to validate if sampling is happening during this path.

codecov-commenter · 2025-06-20T17:47:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.30%. Comparing base (1a476de) to head (76ac953).
Report is 489 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #16164      +/-   ##
============================================
+ Coverage     62.90%   63.30%   +0.40%     
+ Complexity     1386     1363      -23     
============================================
  Files          2867     2976     +109     
  Lines        163354   172937    +9583     
  Branches      24952    26497    +1545     
============================================
+ Hits         102755   109485    +6730     
- Misses        52847    55083    +2236     
- Partials       7752     8369     +617

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`63.28% <100.00%> (+0.41%)`	⬆️
java-21	`63.28% <100.00%> (+0.46%)`	⬆️
skip-bytebuffers-false	`?`
skip-bytebuffers-true	`?`
temurin	`63.30% <100.00%> (+0.40%)`	⬆️
unittests	`63.30% <100.00%> (+0.40%)`	⬆️
unittests1	`56.45% <100.00%> (+0.62%)`	⬆️
unittests2	`33.28% <60.00%> (-0.29%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

vvivekiyer

@pchaganl In our deployment we noticed a few other misses (eg: Distinct Queries, PercentileTDigest). Can we make sure to add sampling there too?

vvivekiyer · 2025-07-15T21:39:09Z

...rc/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java

@@ -652,6 +654,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn
    long routingEndTimeNs = System.nanoTime();
    _brokerMetrics.addPhaseTiming(rawTableName, BrokerQueryPhase.QUERY_ROUTING,
        routingEndTimeNs - routingStartTimeNs);
+    // Account the resource used for routing phase
+    Tracing.ThreadAccountantOps.sample();


Let's add a sample at the end of every phase:

Request Compilation

Authorization

Routing

Also add a comment here as to why sampling for routing is important - with single threaded segment pruners, this can take resources when there are a lot of segments.

I have added for Request Compilation & Routing. I did consider for authorization and looked into our latency metric for authorization it is less than 0.1 ms . So avoided it

(nit) Prefer moving the sampling to the location where we compute phase timing for each phase. That way it's evident that we want to sample each broker phase.

Routing was closer to the phase calculation moved compilation as well

Let's add authorization. The idea is that if there's a bug/deployment issue in this phase, we catch it.

vvivekiyer · 2025-07-15T21:42:15Z

...rc/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java

@@ -371,6 +371,8 @@ protected BrokerResponse doHandleRequest(long requestId, String query, SqlNodeAn
    CompileResult compileResult =
        compileRequest(requestId, query, sqlNodeAndOptions, request, requesterIdentity, requestContext, httpHeaders,
            accessControl);
+    // Accounts for resource usage of the compilation phase
+    Tracing.ThreadAccountantOps.sample();


Can we also make this change for the other handlers?

Currently it is added for SSE and MSE.

GrpcBrokerRequestHandler, SingleConnectionBrokerRequestHandler extend this handler for SSE. TimeSeriesRequestHandler seems to be custom handler.

For MSE it is added in QueryEnvironment

vvivekiyer · 2025-07-15T21:47:47Z

pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
  @Override
  public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
    rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
+    // Sample to track usage of query planning
+    Tracing.ThreadAccountantOps.sample();


Can we use sampleAndCheckInterruption() in all these places to make sure the thread exists if it's already been interrupted?

I see the usage of sampleAndCheckInterruption() mostly on broker on merge. I taught there was reason for doing that specifically on broker.

Is sampleAndCheckInterruption() a more evolved approach we want to move on all places on server as well?

You should see it used on server as well in some of the operators. The only difference is one blindly samples. While the other checks for interruption and throws exception before sampling.

vvivekiyer · 2025-07-15T21:51:04Z

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

@@ -273,6 +274,8 @@ public CompiledQuery compile(String sqlQuery, SqlNodeAndOptions sqlNodeAndOption
        queryNode = sqlNode;
      }
      RelRoot relRoot = compileQuery(queryNode, plannerContext);
+      // Accounts for resource usage of the compilation phase
+      Tracing.ThreadAccountantOps.sampleMSE();


setupRunner is called much later before dispatch in MultistageBrokerRequestHandler. Two questions here:

It will not work for multistage handlers.

If this is for the single stage handler fallback path, is this really necessary if we already are sampling after the compilation phase?

You are right it was setup after compilation, moved it up, take a look if looks right?

This path is touched by MSE handler so it is not added as a fallback

It doesn't look right to me. The QueryEnv.compile() happens in an async call with a separate threadpool. We might have to find another way to account for these async calls in MSE.

vvivekiyer · 2025-07-15T21:54:31Z

pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
  @Override
  public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
    rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
+    // Sample to track usage of query planning


We need to track 2 things on the server right?

Query Planning

Segment Pruning
Can you make sure both are tracked?

Added for Segment pruning

vvivekiyer · 2025-07-21T17:14:52Z

...ker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java

@@ -298,7 +298,6 @@ protected BrokerResponse handleRequestThrowing(long requestId, String query, Sql

    try (QueryEnvironment.CompiledQuery compiledQuery =
        compileQuery(requestId, query, sqlNodeAndOptions, httpHeaders, queryTimer)) {
-


(nit) remove spurious changes.

This was because of a bad merge with master, fixed it

vvivekiyer · 2025-07-21T17:50:46Z

pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

@@ -305,6 +306,8 @@ private void applyQueryOptions(QueryContext queryContext) {
  @Override
  public PlanNode makeSegmentPlanNode(SegmentContext segmentContext, QueryContext queryContext) {
    rewriteQueryContextWithHints(queryContext, segmentContext.getIndexSegment());
+    // Sample to track usage of query planning, since it can be expensive for large segment lists.
+    Tracing.ThreadAccountantOps.sampleAndCheckInterruption();


Keep all the sampling in a single place without moving it inside the planning/pruning classes.. That way it's easier to reason about it.

For example: can we add both the pruning and planning sampling in ServerQueryExecutorV1Impl.

Pruning sampling in executeInternal() after the selectedSegmentsInfo is computed.

Planning in executeQuery before the planCombineQuery is called.

I understand that consolidating the sampling into a single location can improve readability, but in this case it would come at the cost of precision—because you’d only capture metrics once, after the entire operation completes. For example, we have several pruner and ideally need to record a sample after each one to maintain accuracy (and the same applies to planning).

praveenc7 added 4 commits June 16, 2025 16:56

fix

f67b525

Merge branch 'master' into integ-test

487ef83

sampling

fce8e0d

Broker sampling

9105919

revert integ-test

ba45e05

praveenc7 force-pushed the sample branch from ec7a2ef to ba45e05 Compare June 23, 2025 18:45

praveenc7 added 2 commits June 25, 2025 11:09

Merge branch 'master' into sample

289ccc4

Fix test failures

c84e41d

praveenc7 force-pushed the sample branch from bc5f019 to c84e41d Compare June 25, 2025 22:02

praveenc7 marked this pull request as ready for review July 14, 2025 17:40

vvivekiyer reviewed Jul 15, 2025

View reviewed changes

praveenc7 added 4 commits July 15, 2025 20:47

review comments

2b1f03d

remove MSE

1a95972

broker auth

cebc2d1

Merge branch 'master' into sample

5717874

vvivekiyer reviewed Jul 21, 2025

View reviewed changes

remove per pruner & planner sample

76ac953

praveenc7 force-pushed the sample branch from c185814 to 76ac953 Compare July 21, 2025 22:59

		@@ -298,7 +298,6 @@ protected BrokerResponse handleRequestThrowing(long requestId, String query, Sql

		try (QueryEnvironment.CompiledQuery compiledQuery =
		compileQuery(requestId, query, sqlNodeAndOptions, httpHeaders, queryTimer)) {

[Query Resource Isolation] Additonal Sampling for Broker and Server #16164

Are you sure you want to change the base?

[Query Resource Isolation] Additonal Sampling for Broker and Server #16164

Conversation

praveenc7 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

codecov-commenter commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vvivekiyer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

praveenc7 commented Jun 20, 2025 •

edited

Loading

codecov-commenter commented Jun 20, 2025 •

edited

Loading