Skip to content

Conversation

@guo-shaoge
Copy link
Collaborator

@guo-shaoge guo-shaoge commented Jan 21, 2026

What problem does this PR solve?

Issue Number: close #63887

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

planner: fix join reorder correctness with conflict detection algorithm

Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Copilot AI review requested due to automatic review settings January 21, 2026 13:41
@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/planner SIG: Planner labels Jan 21, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a work-in-progress pull request that introduces a new join reorder implementation to address correctness problems with outer join reordering. The PR adds a new joinorder package with conflict detection capabilities based on join reorder theory, while maintaining the existing join reorder logic as a fallback.

Changes:

  • Adds new join order optimizer with conflict detector for handling outer joins
  • Introduces conditional logic to enable the new implementation via session variable
  • Adds schema equality checking method for validation after reorder

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
pkg/planner/core/rule_join_reorder.go Adds conditional branching to use new join order implementation when EnableOuterJoinReorder is enabled
pkg/planner/core/joinorder/join_order.go Implements new join order optimization with greedy algorithm, leading hint support, and join group extraction
pkg/planner/core/joinorder/conflict_detector.go Implements conflict detection logic for join reordering with associativity and commutativity rules
pkg/planner/core/base/plan_base.go Adds documentation comment noting that JoinType enum order must remain unchanged for conflict detector
pkg/expression/util.go Adds TODO comment about refining the FilterOutInPlace function
pkg/expression/schema.go Adds Equal method to Schema for comparing schema equality after join reorder

resNodes = append(resNodes, cartesianNodes[i])
break
}
newJoin, err := newCartesianJoin(ctx, cartesianNodes[i].p, cartesianNodes[i+1].p)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function newCartesianJoin accepts a joinType parameter but is called without specifying the join type. The call should be newCartesianJoin(ctx, base.InnerJoin, left, right) to specify an inner join for Cartesian products.

Suggested change
newJoin, err := newCartesianJoin(ctx, cartesianNodes[i].p, cartesianNodes[i+1].p)
newJoin, err := newCartesianJoin(ctx, base.InnerJoin, cartesianNodes[i].p, cartesianNodes[i+1].p)

Copilot uses AI. Check for mistakes.
}
}
// gjt todo set children?
if p, err = j.optimizeForJoinGroup(p.SCtx(), joinGroup); err != nil {
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable j is used but not defined. The function optimizeRecursive calls j.optimizeForJoinGroup but j is not a receiver and is not declared anywhere in scope.

Copilot uses AI. Check for mistakes.
Comment on lines 380 to 381
selection.Conditions = append(selection.Conditions, e.eqConds...)
selection.Conditions = append(selection.Conditions, e.nonEQConds...)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function attempts to convert expressions from []*expression.ScalarFunction to []expression.Expression by appending them directly. While ScalarFunction implements Expression, the type conversion in line 380 may cause issues. More critically, lines 380-381 should use type conversion or a helper function to properly handle the conversion.

Suggested change
selection.Conditions = append(selection.Conditions, e.eqConds...)
selection.Conditions = append(selection.Conditions, e.nonEQConds...)
selection.Conditions = append(selection.Conditions, expression.ScalarFuncs2Exprs(e.eqConds)...)
selection.Conditions = append(selection.Conditions, expression.ScalarFuncs2Exprs(e.nonEQConds)...)

Copilot uses AI. Check for mistakes.
Comment on lines 87 to 90
vertexMap[v.ID()] = &Node{
bitSet: newBitSet(int64(i)),
p: v,
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cumCost field is not initialized when creating Nodes for vertexes. This field is used in the greedy join order algorithm to sort and compare nodes (line 276 in join_order.go), but leaf nodes will have cumCost of 0. The cumCost should be initialized based on the vertex plan's statistics or row count.

Suggested change
vertexMap[v.ID()] = &Node{
bitSet: newBitSet(int64(i)),
p: v,
}
node := &Node{
bitSet: newBitSet(int64(i)),
p: v,
}
if stats := v.StatsInfo(); stats != nil {
node.cumCost = stats.RowCount
}
vertexMap[v.ID()] = node

Copilot uses AI. Check for mistakes.
// It could be otherCond, leftCond or rightCond.
nonEQConds expression.CNFExprs

tes BitSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an imported bitset

// 0: rule doesn't apply
// 1: rule applies
// 2: rule applies when null-rejective holds
type ruleTableEntry int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not bool

return rule
}

func (d *ConflictDetector) calcTES(conds []base.Expression) BitSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments

Comment on lines 375 to 378
// Append selections to existing join
selection := logicalop.LogicalSelection{
Conditions: []expression.Expression{}, // gjt todo reserve space
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will we enter this? Why is it not in makeNonInnerJoin?

Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge changed the title [WIP] join reorder correctness problem [WIP] planner: join reorder correctness problem Jan 25, 2026
@guo-shaoge guo-shaoge added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2026
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@hawkingrei
Copy link
Member

/retest

Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@hawkingrei
Copy link
Member

/retest

Signed-off-by: guo-shaoge <[email protected]>
@hawkingrei
Copy link
Member

/retest

Signed-off-by: guo-shaoge <[email protected]>
@guo-shaoge guo-shaoge changed the title [WIP] planner: fix join reorder correctness with CD-C conflict detection | tidb-test=pr/2676 planner: fix join reorder correctness with CD-C conflict detection | tidb-test=pr/2676 Feb 3, 2026
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2026
@guo-shaoge guo-shaoge changed the title planner: fix join reorder correctness with CD-C conflict detection | tidb-test=pr/2676 planner: fix join reorder correctness with conflict detection algorithm | tidb-test=pr/2676 Feb 3, 2026
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. and removed do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-tests-checked labels Feb 3, 2026
@tiprow
Copy link

tiprow bot commented Feb 3, 2026

@guo-shaoge: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow b92650e link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 3, 2026

@guo-shaoge: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-ddl-test 874eedc link true /test pull-integration-ddl-test
pull-unit-test-next-gen b92650e link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test b92650e link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@guo-shaoge guo-shaoge requested a review from winoros February 3, 2026 13:39
@terry1purcell
Copy link
Contributor

@pantheon-bot please review

@pantheon-ai
Copy link

pantheon-ai bot commented Feb 3, 2026

  1. P2: Hard limit of 64 join vertices causes planning failure

    • When a join group has >64 leaf vertices, the conflict detector returns an error that propagates to query planning, causing the query to fail instead of gracefully falling back.
    • Evidence: pkg/planner/core/joinorder/conflict_detector.go:108 - if len(group.vertexes) > 64 { return nil, errors.Errorf("too many vertexes...") }
    • Impact: Queries with very wide joins (>64 tables in one group) will fail with an error on the default path.
  2. P2: DP join reorder support removed, threshold ignored

    • The new join reorder implementation hardcodes greedy mode, making tidb_opt_join_reorder_threshold ineffective.
    • Evidence: pkg/planner/core/joinorder/join_order.go:261 - useGreedy := true with DP path as panic("not implement yet")
    • Impact: Configuration/feature parity gap for non-default deployments using threshold tuning.
  3. P2: Session variable tidb_opt_join_reorder_through_sel silently ignored

    • When outer join reorder is enabled (default), the code bypasses legacy join group extraction that honors this session variable.
    • Evidence: pkg/planner/core/rule_join_reorder.go:248 - short-circuits to joinorder.Optimize which doesn't check the variable
    • Impact: The session variable becomes a no-op under default configuration.
  4. P2: alignEQConds can create invalid join keys

    • The function has logic that may build equality functions with nil arguments and doesn't propagate injected projections back to join children.
    • Evidence: pkg/planner/core/joinorder/conflict_detector.go:506 - declares var lCol, rCol expression.Expression but may not assign both before using
    • Impact: Latent bug, likely unreachable with current planner invariants but represents faulty porting from legacy code.

// JoinType contains CrossJoin, InnerJoin, LeftOuterJoin, RightOuterJoin, SemiJoin, AntiJoin.
type JoinType int

// NOTE: keep the order and value unchanged, because they are used in conflict_detector.go!!!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an intest.Assert instead.

// It could be otherCond, leftCond or rightCond.
nonEQConds expression.CNFExprs

tes intset.FastIntSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the imported bitset instead.

FastIntSet is used for sparse case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go.mod: github.com/bits-and-blooms/bitset v1.14.3

return nodes, nil
}

// TODO add example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to modify this comment

where t1.c1 < 10 or t1.c2 < 10 for update`).Check(testkit.Rows(
"Projection 16635.64 root test.t1.c1",
"└─SelectLock 16635.64 root for update 0",
" └─Projection 16635.64 root test.t1.c1, test.t1._tidb_rowid, test.t1._tidb_tid, test.t2._tidb_rowid",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this test change? Is this expected?

@@ -1956,7 +1950,7 @@ HashJoin root CARTESIAN inner join
└─Selection cop[tikv] not(isnull(planner__core__casetest__rule__rule_join_reorder.t8.a))
└─TableFullScan cop[tikv] table:t8 keep order:false, stats:pseudo
Level Code Message
Warning 1815 We can only use one leading hint at most, when multiple leading hints are used, all leading hints will be invalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

planner: incorrect join reorder for outer-join

4 participants