Skip to content

Conversation

bsrikanth-mariadb
Copy link
Contributor

@bsrikanth-mariadb bsrikanth-mariadb commented Oct 7, 2025

MDEV-35206: Assertion failure in JOIN::dbug_verify_sj_inner_tables
A nested select query is crashing in when optimizer_join_limit_pref_ratio=10
and optimizer_search_depth=1 due to an assertion failure in
JOIN::dbug_verify_sj_inner_tables().

In sql_select.cc#choose_plan(), there are 2 back-2-back calls to
greedy_search(). The first one is invoked to build a join plan
that can short-cut ORDER BY...LIMIT, while the second invocation
to not consider short-cut.

The greedy_search() should start with a value of join->cur_sj_inner_tables
set to 0. However, the first greedy_search() call left the value of
join->cur_sj_inner_tables to "6". This caused the assert to fail in
dbug_verify_sj_inner_tables() as soon as the second greedy_search() started,
where in it was expecting a value of 0.

Similar problem is noticed with cur_embedding_map in the case of nested
joins, and nested_join counter.

Initialize join->cur_sj_inner_tables, and join->cur_embedding_map to 0,
and also invoke reset_nj_counters() during the start of greedy_search(),
and optimize_straight_join().

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from 921c224 to 60871f9 Compare October 7, 2025 05:22
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from 60871f9 to f47aac8 Compare October 9, 2025 06:55
@spetrunia
Copy link
Member

Technically the fix is correct but it doesn't fix a class of similar problems.
I saw JOIN::cur_embedding_map defined next to JOIN::cur_sj_inner_tables and asked myself a question, how that one is re-initialized? Well, it isn't:

https://jira.mariadb.org/browse/MDEV-35206?focusedCommentId=315051&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-315051

There is also reset_nj_counters() call which we probably also need to make.

I suggest moving initialization of JOIN::cur_embedding_map and JOIN::cur_sj_inner_tables into a function that is called from the start of greedy_search and from the start of optimize_straight_join .

Copy link
Member

@spetrunia spetrunia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to address the above.

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from f47aac8 to 428e80d Compare October 15, 2025 05:08
@bsrikanth-mariadb
Copy link
Contributor Author

Technically the fix is correct but it doesn't fix a class of similar problems. I saw JOIN::cur_embedding_map defined next to JOIN::cur_sj_inner_tables and asked myself a question, how that one is re-initialized? Well, it isn't:

https://jira.mariadb.org/browse/MDEV-35206?focusedCommentId=315051&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-315051

There is also reset_nj_counters() call which we probably also need to make.

I suggest moving initialization of JOIN::cur_embedding_map and JOIN::cur_sj_inner_tables into a function that is called from the start of greedy_search and from the start of optimize_straight_join .

Sure. Made the suggested changes. However, I am not sure if reset_nj_counters() is really required. The test cases have no problem even without making a call to that function.

@spetrunia
Copy link
Member

However, I am not sure if reset_nj_counters() is really required

When one isn't sure, one should try finding out :-)

reset_nj_counters() resets the values of NESTED_JOIN::counter
these are updated by check_interleaving_with_nj when building the join order
The reverse function is restore_prev_nj_state.

I've made the testcase slightly convoluted and added printouts:

diff --git a/mysql-test/main/join_nested.test b/mysql-test/main/join_nested.test
index cf82bb91ee0..b59e19e3b50 100644
--- a/mysql-test/main/join_nested.test
+++ b/mysql-test/main/join_nested.test
@@ -1514,11 +1514,21 @@ CREATE TABLE t12(a int, b int, index(b));
 INSERT INTO t11 select seq, seq FROM seq_1_to_20;
 INSERT INTO t12 select seq, seq FROM seq_1_to_40;
 
+CREATE TABLE t14(a int, b int);
+CREATE TABLE t15(a int, b int, index(b));
+
+INSERT INTO t14 select seq, seq FROM seq_1_to_20;
+INSERT INTO t15 select seq, seq FROM seq_1_to_40;
+
 ANALYZE TABLE t10, t11, t12;
 
 EXPLAIN SELECT *
 FROM
-  t10 LEFT JOIN (t11 JOIN t12 ON t11.b=t12.b) ON t10.a=t11.a
+  t10 LEFT JOIN 
+  (
+    t11 JOIN t12 ON t11.b=t12.b 
+     left join (t14 join t15 on t14.b=t15.b) on t14.a=t11.a
+  ) ON t10.a=t11.a
 ORDER BY t10.b LIMIT 1;
 
 DROP TABLE t10, t11, t12;
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 14ef7dec20a..a9f9cbc8094 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -18998,6 +18998,10 @@ static bool check_interleaving_with_nj(JOIN_TAB *next_tab)
     if (!next_emb->sj_on_expr)
     {
       next_emb->nested_join->counter++;
+      fprintf(stderr,
+              "check_interleaving_with_nj: nested_join(%p): counter++=%d n_tables=%d\n", 
+              next_emb->nested_join,
+              next_emb->nested_join->counter, next_emb->nested_join->n_tables);
       if (next_emb->nested_join->counter == 1)
       {
         /* 
@@ -19093,6 +19097,8 @@ static void restore_prev_nj_state(JOIN_TAB *last)
 
       if (--nest->counter == 0)
         join->cur_embedding_map&= ~nest->nj_map;
+      fprintf(stderr,"restore_prev_nj_state: nested_join(%p): counter--=%d\n",
+              nest, nest->counter);
       
       if (!was_fully_covered)
         break;

In debugger this looks like this:

  Thread 14 "mysqld" hit Breakpoint 1, initialize_join_maps (join=0x7fff2c01d8f8) at /home/psergey/dev-git2/10.11-look/sql/sql_select.c
(gdb) c
  Continuing.
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
...

... lots of those... note that value of counter <= n_tables .

  Thread 14 "mysqld" hit Breakpoint 1, initialize_join_maps (join=0x7fff2c01d8f8) at /home/psergey/dev-git2/10.11-look/sql/sql_select.c
(gdb) c
  Continuing.

Ok now we are in the second greedy_search() call:
...

  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=5 n_tables=3
  check_interleaving_with_nj: nested_join(0x7fff2c01ad18): counter++=3 n_tables=2

note that counter > n_tables. This seems wrong.
This didn't cause a crash for this example. You can try building one that does crash. Or add an assert into check_interleaving_with_nj.

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from 428e80d to b8a5d1f Compare October 15, 2025 15:59
@bsrikanth-mariadb
Copy link
Contributor Author

However, I am not sure if reset_nj_counters() is really required

When one isn't sure, one should try finding out :-)

reset_nj_counters() resets the values of NESTED_JOIN::counter these are updated by check_interleaving_with_nj when building the join order The reverse function is restore_prev_nj_state.

I've made the testcase slightly convoluted and added printouts:

diff --git a/mysql-test/main/join_nested.test b/mysql-test/main/join_nested.test
index cf82bb91ee0..b59e19e3b50 100644
--- a/mysql-test/main/join_nested.test
+++ b/mysql-test/main/join_nested.test
@@ -1514,11 +1514,21 @@ CREATE TABLE t12(a int, b int, index(b));
 INSERT INTO t11 select seq, seq FROM seq_1_to_20;
 INSERT INTO t12 select seq, seq FROM seq_1_to_40;
 
+CREATE TABLE t14(a int, b int);
+CREATE TABLE t15(a int, b int, index(b));
+
+INSERT INTO t14 select seq, seq FROM seq_1_to_20;
+INSERT INTO t15 select seq, seq FROM seq_1_to_40;
+
 ANALYZE TABLE t10, t11, t12;
 
 EXPLAIN SELECT *
 FROM
-  t10 LEFT JOIN (t11 JOIN t12 ON t11.b=t12.b) ON t10.a=t11.a
+  t10 LEFT JOIN 
+  (
+    t11 JOIN t12 ON t11.b=t12.b 
+     left join (t14 join t15 on t14.b=t15.b) on t14.a=t11.a
+  ) ON t10.a=t11.a
 ORDER BY t10.b LIMIT 1;
 
 DROP TABLE t10, t11, t12;
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 14ef7dec20a..a9f9cbc8094 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -18998,6 +18998,10 @@ static bool check_interleaving_with_nj(JOIN_TAB *next_tab)
     if (!next_emb->sj_on_expr)
     {
       next_emb->nested_join->counter++;
+      fprintf(stderr,
+              "check_interleaving_with_nj: nested_join(%p): counter++=%d n_tables=%d\n", 
+              next_emb->nested_join,
+              next_emb->nested_join->counter, next_emb->nested_join->n_tables);
       if (next_emb->nested_join->counter == 1)
       {
         /* 
@@ -19093,6 +19097,8 @@ static void restore_prev_nj_state(JOIN_TAB *last)
 
       if (--nest->counter == 0)
         join->cur_embedding_map&= ~nest->nj_map;
+      fprintf(stderr,"restore_prev_nj_state: nested_join(%p): counter--=%d\n",
+              nest, nest->counter);
       
       if (!was_fully_covered)
         break;

In debugger this looks like this:

  Thread 14 "mysqld" hit Breakpoint 1, initialize_join_maps (join=0x7fff2c01d8f8) at /home/psergey/dev-git2/10.11-look/sql/sql_select.c
(gdb) c
  Continuing.
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
  restore_prev_nj_state: nested_join(0x7fff2c01b9c8): counter--=0
  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=1 n_tables=3
...

... lots of those... note that value of counter <= n_tables .

  Thread 14 "mysqld" hit Breakpoint 1, initialize_join_maps (join=0x7fff2c01d8f8) at /home/psergey/dev-git2/10.11-look/sql/sql_select.c
(gdb) c
  Continuing.

Ok now we are in the second greedy_search() call: ...

  check_interleaving_with_nj: nested_join(0x7fff2c01b9c8): counter++=5 n_tables=3
  check_interleaving_with_nj: nested_join(0x7fff2c01ad18): counter++=3 n_tables=2

note that counter > n_tables. This seems wrong. This didn't cause a crash for this example. You can try building one that does crash. Or add an assert into check_interleaving_with_nj.

Ah! thanks Sergei for the test case.

Added an assert for the provided patch, and removed print statements.

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from b8a5d1f to f2eec0e Compare October 16, 2025 02:47
@spetrunia
Copy link
Member

Now, there are multiple calls:

  initialize_join_maps(join);
  reset_nj_counters(join, join->join_list);

This calls for moving the call to reset_nj_counters() into initialize_join_maps(). Please do that.

Please rename initialize_join_maps() to init_join_plan_search_state() .

Also, the call to

  reset_nj_counters(join, join->join_list);

at the start of choose_plan() is now redundant, please remove.

Also, this assignment at start of choose_plan() is now redundant, please remove:

  join->cur_embedding_map= 0;

The same goes for this assignment:

  /*
    Note: constant tables are already in the join prefix. We don't
    put them into the cur_sj_inner_tables, though.
  */
  join->cur_sj_inner_tables= 0;

@spetrunia spetrunia self-requested a review October 16, 2025 10:12
Copy link
Member

@spetrunia spetrunia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close.
Please address the above.

A nested select query is crashing in when optimizer_join_limit_pref_ratio=10
and optimizer_search_depth=1 due to an assertion failure in
JOIN::dbug_verify_sj_inner_tables().

In sql_select.cc#choose_plan(), there are 2 back-2-back calls to
greedy_search(). The first one is invoked to build a join plan
that can short-cut ORDER BY...LIMIT, while the second invocation
to not consider short-cut.

The greedy_search() should start with a value of join->cur_sj_inner_tables
set to 0. However, the first greedy_search() call left the value of
join->cur_sj_inner_tables to "6". This caused the assert to fail in
dbug_verify_sj_inner_tables() as soon as the second greedy_search() started,
where in it was expecting a value of 0.

Similar problem is noticed with cur_embedding_map in the case of nested
joins, and nested_join counter.

Initialize join->cur_sj_inner_tables, and join->cur_embedding_map to 0,
and also invoke reset_nj_counters() during the start of greedy_search(),
and optimize_straight_join().
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 10.11-MDEV-35206-assertion-failed-nests_entered-and-cur_sj_inner_tables branch from f2eec0e to f20c0e5 Compare October 16, 2025 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants