-
Notifications
You must be signed in to change notification settings - Fork 6.1k
planner: handle the projection between the join group #65367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
planner: handle the projection between the join group #65367
Conversation
|
Hi @Reminiscent. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/retest |
d6bb11c to
baa06fd
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #65367 +/- ##
================================================
+ Coverage 77.7834% 77.8715% +0.0880%
================================================
Files 1993 2002 +9
Lines 544139 561724 +17585
================================================
+ Hits 423250 437423 +14173
- Misses 119230 122502 +3272
- Partials 1659 1799 +140
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ad3cf9f to
4489c9f
Compare
ef050cb to
37a2476
Compare
|
/retest |
2 similar comments
|
/retest |
|
/retest |
84e2400 to
2f9a063
Compare
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements join reorder optimization through projection operators by introducing the tidb_opt_join_reorder_through_proj session variable, allowing the optimizer to inline safe projection expressions and expand join groups that were previously blocked by projections.
Changes:
- Added support for inlining projections containing single-table expressions during join group extraction
- Implemented column-to-expression mapping (colExprMap) to substitute derived columns in join conditions
- Enhanced both greedy and DP join reorder solvers to handle expression-based join keys
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integrationtest/t/planner/core/join_reorder_through_projection.test | Comprehensive test suite covering projection inlining scenarios with single-table, cross-table, and edge cases |
| pkg/sessionctx/variable/sysvar.go | Registration of new session variable tidb_opt_join_reorder_through_proj |
| pkg/sessionctx/variable/session.go | Session variable field definition and initialization |
| pkg/sessionctx/vardef/tidb_vars.go | Session variable constant and default value definition |
| pkg/planner/core/rule_join_reorder_greedy.go | Greedy solver updates to handle expression join keys and column substitution |
| pkg/planner/core/rule_join_reorder_dp.go | DP solver enhancements for expression-based join conditions and improved edge classification |
| pkg/planner/core/rule_join_reorder.go | Core projection inlining logic with colExprMap propagation and substitution |
| pkg/planner/core/operator/logicalop/logical_cte.go | Fix to preserve optimized logical plan for recursive CTEs |
| pkg/expression/util.go | Hash code cleanup for in-place expression mutations |
| pkg/expression/scalar_function_test.go | Test cases for hash code cleaning after expression mutations |
| pkg/expression/scalar_function.go | Enhanced CleanHashCode method and hash code validation improvements |
| // other conditions may be possible to exist across different cartesian join group, | ||
| // resolving cartesianJoin first then adding another selection. |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'cartesianJoin' to 'Cartesian join' for consistency with standard terminology.
| return false | ||
| } | ||
| if sf.hashcode != nil && fun.hashcode != nil { | ||
| if len(sf.hashcode) > 0 && len(fun.hashcode) > 0 { |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check len(sf.hashcode) > 0 is insufficient to determine if a hash code is valid. The CleanHashCode() method sets slices to length 0 (not nil) via [:0], so a zero-length slice could represent either 'not computed' or 'explicitly cleared'. Consider using a dedicated boolean flag or comparing against nil to distinguish between these states.
| // This approach is consistent with rule_eliminate_projection.go. | ||
| func extractJoinGroupImpl(p base.LogicalPlan) *joinGroupResult { | ||
| // NOTE: We only support extracting join groups through a single Selection/Projection layer for now. | ||
| // TODO: Support stacked unary operators like Projection->Selection->Join or Selection->Projection->Join. |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions supporting stacked unary operators as a TODO, but the current implementation only handles a single layer. Consider clarifying whether this limitation is by design for the initial implementation or if it represents a future enhancement opportunity.
| // TODO: Support stacked unary operators like Projection->Selection->Join or Selection->Projection->Join. | |
| // This is an intentional limitation to keep the initial implementation simple and predictable. | |
| // TODO(enhancement): If we decide to support stacked unary operators (e.g. Projection->Selection->Join | |
| // or Selection->Projection->Join), extend the extraction to walk through multiple unary layers while | |
| // preserving the existing safety checks and semantics. |
| // TODO: Consider re-running PredicatePushDown on the reordered join tree when projections | ||
| // are inlined (TiDBOptJoinReorderThroughProj and result.colExprMap is non-empty). | ||
| // PPDSolver runs before JoinReOrderSolver, so predicates created/changed by projection | ||
| // substitution (e.g. becoming single-node/constant) won't be pushed down automatically. |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO indicates that predicates created or modified during projection substitution may not be optimally pushed down. This could lead to suboptimal query plans where filters are evaluated higher in the tree than necessary. Consider implementing the suggested PredicatePushDown re-run to improve performance, or document the expected performance impact of this limitation.
| // TODO: Consider re-running PredicatePushDown on the reordered join tree when projections | |
| // are inlined (TiDBOptJoinReorderThroughProj and result.colExprMap is non-empty). | |
| // PPDSolver runs before JoinReOrderSolver, so predicates created/changed by projection | |
| // substitution (e.g. becoming single-node/constant) won't be pushed down automatically. | |
| // Note: We currently do not re-run PredicatePushDown on the reordered join tree when | |
| // projections are inlined (TiDBOptJoinReorderThroughProj and result.colExprMap is non-empty). | |
| // PPDSolver runs before JoinReOrderSolver, so predicates created/changed by projection | |
| // substitution (e.g. becoming single-node/constant) may remain higher in the plan tree and | |
| // thus may not be pushed down as aggressively as possible. This is a known limitation that | |
| // can lead to suboptimal plans for some queries, but it does not affect correctness. Any | |
| // future improvement would involve re-running PredicatePushDown at this point after join | |
| // reordering, with associated planning-cost and complexity trade-offs. |
| // TODO: When a derived column (e.g., t2.b * 2) is substituted in otherConds and also | ||
| // appears in the output projection, the expression may be computed twice. Consider | ||
| // introducing common subexpression elimination or referencing the computed column | ||
| // instead of duplicating the expression. Example: |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO highlights a potential performance issue where expressions may be computed multiple times when they appear in both conditions and projections. This redundant computation could be significant for expensive expressions. Consider implementing common subexpression elimination or updating the substitution logic to reference already-computed columns.
| eqConds = append(eqConds, newSf) | ||
| // Best-effort optimization: if projection inlining produced an edge we can't attribute | ||
| // to left/right here, return an error so the caller can fallback safely. | ||
| return nil, plannererrors.ErrInternal.GenWithStack("join reorder dp: eq edge doesn't connect left/right plans") |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'eq edge doesn't connect left/right plans' could be more descriptive. Consider including information about which edge failed and what schemas were expected, to aid in debugging if this error occurs.
| return nil, plannererrors.ErrInternal.GenWithStack("join reorder dp: eq edge doesn't connect left/right plans") | |
| return nil, plannererrors.ErrInternal.GenWithStack( | |
| "join reorder dp: eq edge doesn't connect left/right plans, edge=%s, lArg=%s, rArg=%s, leftSchema=%s, rightSchema=%s", | |
| edge.edge, lArg, rArg, leftPlan.Schema(), rightPlan.Schema(), | |
| ) |
| sf.hashcode = sf.hashcode[:0] | ||
| sf.canonicalhashcode = sf.canonicalhashcode[:0] |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CleanHashCode() method sets slices to zero length but retains the underlying capacity. While this is memory-efficient, the interaction with the Equal() method's check len(sf.hashcode) > 0 creates an ambiguity between 'never computed' (nil) and 'explicitly cleared' (zero-length). Consider documenting this behavior clearly or using nil assignments for clarity, especially since the Equal() method relies on length checks to determine hash validity.
| sf.hashcode = sf.hashcode[:0] | |
| sf.canonicalhashcode = sf.canonicalhashcode[:0] | |
| sf.hashcode = nil | |
| sf.canonicalhashcode = nil |
| // ReHashCode is used after we change the argument in place. | ||
| func ReHashCode(sf *ScalarFunction) { | ||
| sf.hashcode = sf.hashcode[:0] | ||
| sf.canonicalhashcode = sf.canonicalhashcode[:0] |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReHashCode function now clears canonicalhashcode, but it doesn't recompute it. This means after calling ReHashCode, the canonical hash code will be empty even though the regular hash code is valid. This inconsistency could lead to incorrect equality comparisons if code relies on the canonical hash. Either remove this line or ensure CanonicalHashCode() is called after to recompute it.
update
2f9a063 to
ba6f39c
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@Reminiscent: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@Reminiscent: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #50229
Problem Summary:
handle the projection between the join group
What changed and how does it work?
https://docs.google.com/document/d/1x5QDc0i9ks9Jy-I2f_3_lsz3oqu1lFbcb-1h3zAD6a4
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.