[Multi-stage] Support lookup join #13966

Jackie-Jiang · 2024-09-09T22:24:03Z

Add lookup join strategy as a hint (e.g. /*+ joinOptions(join_strategy='lookup') */)
Right table should be leaf stage projection only
TODO: Improve planner rules to not push down expression/filter into the dimension table side

codecov-commenter · 2024-09-09T23:37:09Z

Codecov Report

Attention: Patch coverage is 36.98113% with 167 lines in your changes missing coverage. Please review.

Project coverage is 63.81%. Comparing base (59551e4) to head (40d5724).
Report is 1264 commits behind head on master.

Files with missing lines	Patch %	Lines
...not/query/runtime/operator/LookupJoinOperator.java	0.00%	102 Missing ⚠️
.../query/planner/logical/RelToPlanNodeConverter.java	52.63%	14 Missing and 4 partials ⚠️
.../org/apache/pinot/query/routing/WorkerManager.java	57.14%	12 Missing and 3 partials ⚠️
.../runtime/plan/server/ServerPlanRequestVisitor.java	65.00%	3 Missing and 4 partials ⚠️
...ite/rel/rules/PinotJoinExchangeNodeInsertRule.java	75.00%	2 Missing and 1 partial ⚠️
...inot/query/planner/serde/PlanNodeDeserializer.java	50.00%	2 Missing and 1 partial ⚠️
.../pinot/query/planner/serde/PlanNodeSerializer.java	50.00%	2 Missing and 1 partial ⚠️
...e/pinot/query/runtime/InStageStatsTreeBuilder.java	25.00%	2 Missing and 1 partial ⚠️
...e/operator/LeafStageTransferableBlockOperator.java	0.00%	3 Missing ⚠️
...not/query/runtime/operator/MultiStageOperator.java	25.00%	3 Missing ⚠️
... and 3 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13966      +/-   ##
============================================
+ Coverage     61.75%   63.81%   +2.06%     
- Complexity      207     1532    +1325     
============================================
  Files          2436     2622     +186     
  Lines        133233   144260   +11027     
  Branches      20636    22069    +1433     
============================================
+ Hits          82274    92062    +9788     
- Misses        44911    45407     +496     
- Partials       6048     6791     +743

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.75% <36.98%> (+2.04%)`	⬆️
java-21	`63.69% <36.98%> (+2.07%)`	⬆️
skip-bytebuffers-false	`63.81% <36.98%> (+2.06%)`	⬆️
skip-bytebuffers-true	`63.63% <36.98%> (+35.90%)`	⬆️
temurin	`63.81% <36.98%> (+2.06%)`	⬆️
unittests	`63.81% <36.98%> (+2.06%)`	⬆️
unittests1	`55.45% <36.98%> (+8.56%)`	⬆️
unittests2	`34.35% <0.00%> (+6.62%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gortiz · 2024-09-10T09:31:53Z

...lanner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java

+    if (PinotHintOptions.JoinHintOptions.LOOKUP_JOIN_STRATEGY.equals(joinStrategy)) {
+      // Lookup join
+      Preconditions.checkArgument(!joinInfo.leftKeys.isEmpty(), "Lookup join requires join keys");
+      newLeftInput = PinotLogicalExchange.create(leftInput, RelDistributions.hash(joinInfo.leftKeys));
+      // Right table should be a dimension table, and the right input should be an identifier only ProjectNode over
+      // TableScanNode.
+      Preconditions.checkState(rightInput instanceof Project, "Right input for lookup join must be a Project, got: %s",
+          rightInput.getClass().getSimpleName());
+      Project project = (Project) rightInput;
+      for (RexNode node : project.getProjects()) {
+        Preconditions.checkState(node instanceof RexInputRef,
+            "Right input for lookup join must be an identifier (RexInputRef) only Project, got: %s in project",
+            node.getClass().getSimpleName());
+      }
+      RelNode projectInput = PinotRuleUtils.unboxRel(project.getInput());
+      Preconditions.checkState(projectInput instanceof TableScan,
+          "Right input for lookup join must be a Project over TableScan, got Project over: %s",
+          projectInput.getClass().getSimpleName());
+      newRightInput = rightInput;


Maybe not needed for this PR, but in the future I think we should need to start moving logic to new rules. For example, we can have a rule that only applies when the hint is enabled and the right hand side is a project and (...all conditions). If the rule applies, we change the node to a DimJoin + exchanges.

gortiz · 2024-09-10T09:34:53Z

...lanner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinToDynamicBroadcastRule.java

-        ? ((HepRelVertex) join.getLeft()).getCurrentRel() : join.getLeft());
-    PinotLogicalExchange right = (PinotLogicalExchange) (join.getRight() instanceof HepRelVertex
-        ? ((HepRelVertex) join.getRight()).getCurrentRel() : join.getRight());
+    PinotLogicalExchange left = (PinotLogicalExchange) PinotRuleUtils.unboxRel(join.getLeft());


Note for future PRs: I think we can change PinotRuleUtils.unboxRel to return T extends PinotLogicalExcange. So we don't need to add the cast every time

gortiz · 2024-09-10T10:12:03Z

pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/PhysicalPlanVisitor.java

+    // Validation
+    JoinRelType joinType = node.getJoinType();
+    int numLeftColumns = leftSchema.size();
+    int numResultColumns = node.getDataSchema().size();
+    if (joinType.projectsRight()) {
+      int numRightColumns = right.getDataSchema().size();
+      Preconditions.checkState(numLeftColumns + numRightColumns == numResultColumns,
+          "Invalid number of columns for join type: %s, left: %s, right: %s, result: %s", joinType, numLeftColumns,
+          numRightColumns, numResultColumns);
+    } else {
+      Preconditions.checkState(numLeftColumns == numResultColumns,
+          "Invalid number of columns for join type: %s, left: %s, result: %s", joinType, numLeftColumns,
+          numResultColumns);
+    }
+
+    PlanNode.NodeHint nodeHint = node.getNodeHint();
+    String joinStrategy = null;
+    Map<String, String> joinHints = nodeHint.getHintOptions().get(PinotHintOptions.JOIN_HINT_OPTIONS);
+    if (joinHints != null) {
+      joinStrategy = joinHints.get(PinotHintOptions.JoinHintOptions.JOIN_STRATEGY);
+    }
+    if (PinotHintOptions.JoinHintOptions.LOOKUP_JOIN_STRATEGY.equals(joinStrategy)) {
+      return new LookupJoinOperator(context, leftOperator, rightOperator, node);
+    } else {
+      return new HashJoinOperator(context, leftOperator, leftSchema, rightOperator, node);
+    }


Not blocker, but I would prefer to have another Calcite operator for lookup joins

gortiz · 2024-09-10T10:13:53Z

...ntime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestVisitor.java

+    if (right instanceof MailboxReceiveNode
+        && ((MailboxReceiveNode) right).getExchangeType() == PinotRelExchangeType.PIPELINE_BREAKER) {


Same here. Don't you think we are repeating checks all the way down to the physical plan? Instead we should decide the type of join we will use at broker level and just blindly transform the calcite PinotPipelineBreakerJoin, PinotHashJoin or PinotLookupJoin into the physical operator.

Again, not blocker

Are you suggesting using our customized Rel over the calcite LogicalJoin so that we can differentiate different join types? Then we can add different JoinNode accordingly for the ser/de?

Short answer: Yes.

Long answer:

Calcite itself expects that. The root idea in Calcite is that rules should optimize logical rules (ie pushing filters into joins), apply distribution, etc and then some final rules will transform logical rules (ie LogicalJoin) into specific joins. Calcite for example includes Enumerable operators that implement most logical operators. For example, EnumerableMergeJoin implements a nested loop join while EnumerableHashJoin implements a hash join. EnumerableJoinRule can be used to decide which one should be used. We wouldn't use EnumerableJoinRule, instead we should have our own rule that decides whether to use hash join, semi join, lookup join... but even more advanced joins like ones that merge a join+limit or join+aggregate.

For example imagine a tree like:

Aggregate (count by A.col1) Join select A select B

Right now Join emits (and allocate) a lot of rows just to be aggregated by its parent. It would be more efficient to count at the same time we build the blocks. Obviously we are not going to apply this kind of optimizations in the short term, but in the medium/large will be very effective and at the same time would be very error prone to repeat the logic that creates these optimizations in both Calcite (to prioritize the plans that can be optimized) and then in ServerPlanRequestVisitor.

Instead the Calcite's way should be to generate the AggregateJoinRel node and then we should be able to blindly generate the executable Pinot Operator whenever a AggregateJoinRel is received, without having to check conditions again (because we assume a friendly Broker that doesn't generate incorrect plans).

About serialization:

Due to our own decisions we decided to add an extra layer of JoinNodes (which I don't think they are necessary) and a layer of GRPC (which makes more sense, but we could just remove the JoinNode layer and transform Calcite operators directly into GRPC). We could also use the JSON representation of Calcite, but AFAIR we decided to use GRPC to do not depend on Calcite breaking backward compatibility.

Added JoinStrategy enum into the JoinNode. It is quite hard to change the Calcite LogicalJoin directly without breaking backward compatibility, so left that as a follow up

gortiz · 2024-09-10T10:15:46Z

.../src/main/resources/examples/batch/colocated/userGroups/userGroups_offline_table_config.json

 {
  "tableName": "userGroups",
  "tableType": "OFFLINE",
+  "isDimTable": true,


Given we also use colocated quickstart to test colocated joins, I think we should have another table like userGroups that is not a dim table.

This is not an actual problem right now that both colocated and dim joins have to be enabled explicitly with hints, but will be an issue in the future when both are going to be applied by default.

* A problem in the sense that we will need to specify hints to use one or the other mode

gortiz

As said a couple of times during the review, I would prefer to move the fact that we have a LookupJoin into Calcite planning phase instead of having to decide whether we use one or the other at both calcite level (generating different exchanges) and then in the server (when the physical plan is generated).

Anyway that is something we can discuss about in the future. For now the current solution is good enough for me.

I think it would be also cool to have a pinot property and a query option we can set to enable this feature by default. I'm already doing that in #13943

gortiz · 2024-10-09T11:41:11Z

We need to document this new lookup mode. I already have this open PR in gitbook. It would be great if you could add a paragraph explaining this one and then merge it.

Jackie-Jiang · 2024-10-09T19:56:35Z

@gortiz Good point. Seems I cannot directly modify the PR, so I merged it and I can add a new paragraph separately

gortiz · 2024-10-23T15:20:51Z

Please remember to document this lookup ;)

Jackie-Jiang · 2024-10-23T20:42:01Z

Documentation: https://docs.pinot.apache.org/users/user-guide-query/multi-stage-query/optimizing-joins#lookup-join

Jackie-Jiang added feature documentation multi-stage Related to the multi-stage query engine labels Sep 9, 2024

Jackie-Jiang requested review from gortiz and xiangfu0 September 9, 2024 22:24

Jackie-Jiang force-pushed the v2_lookup_join branch 3 times, most recently from 3d1beea to 31df207 Compare September 9, 2024 23:00

Jackie-Jiang mentioned this pull request Sep 9, 2024

Improve dimension table handing #13967

Merged

Jackie-Jiang force-pushed the v2_lookup_join branch 2 times, most recently from e272c60 to 04493c3 Compare September 10, 2024 03:48

gortiz reviewed Sep 10, 2024

View reviewed changes

gortiz approved these changes Sep 10, 2024

View reviewed changes

Jackie-Jiang force-pushed the v2_lookup_join branch 2 times, most recently from dfc80c2 to b9b7561 Compare September 13, 2024 00:35

Jackie-Jiang force-pushed the v2_lookup_join branch from b9b7561 to c5a4efa Compare October 7, 2024 18:45

Jackie-Jiang marked this pull request as ready for review October 7, 2024 18:47

[Multi-stage] Support lookup join

40d5724

Jackie-Jiang force-pushed the v2_lookup_join branch from c5a4efa to 40d5724 Compare October 7, 2024 21:15

Jackie-Jiang merged commit bebd2b4 into apache:master Oct 8, 2024

Jackie-Jiang deleted the v2_lookup_join branch October 8, 2024 22:49

Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Oct 10, 2024

Jackie-Jiang mentioned this pull request Nov 13, 2024

[Multi-stage] Improve planner rule to not push down expression/filter into the dimension table side #14441

Open

		if (right instanceof MailboxReceiveNode
		&& ((MailboxReceiveNode) right).getExchangeType() == PinotRelExchangeType.PIPELINE_BREAKER) {

[Multi-stage] Support lookup join #13966

[Multi-stage] Support lookup join #13966

Uh oh!

Conversation

Jackie-Jiang commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gortiz Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

gortiz Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz left a comment

Choose a reason for hiding this comment

Uh oh!

gortiz commented Oct 9, 2024

Uh oh!

Jackie-Jiang commented Oct 9, 2024

Uh oh!

gortiz commented Oct 23, 2024

Uh oh!

Jackie-Jiang commented Oct 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jackie-Jiang commented Sep 9, 2024 •

edited

Loading

codecov-commenter commented Sep 9, 2024 •

edited

Loading

gortiz Sep 10, 2024 •

edited

Loading

gortiz Sep 10, 2024 •

edited

Loading