Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for router INSERT .. SELECT commands #7077

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

onderkalaci
Copy link
Member

@onderkalaci onderkalaci commented Jul 20, 2023

Tradionally our planner works in the following order:
router - > pushdown -> repartition -> pull to coordinator

However, for INSERT .. SELECT commands, we did not support "router".

In practice, that is not a big issue, because pushdown planning can handle router case as well.

However, with PG 16, certain outer joins are converted to JOIN without any conditions (e.g., JOIN .. ON (true)) and the filters are pushed down to the tables.

When the filters are pushed down to the tables, router planner can detect. However, pushdown planner relies on JOIN conditions.

An example query:

INSERT INTO agg_events (user_id)
        SELECT raw_events_first.user_id
        FROM raw_events_first LEFT JOIN raw_events_second
        	ON raw_events_first.user_id = raw_events_second.user_id
        WHERE raw_events_first.user_id = 10;

As a side effect of this change, now we can also relax certain limitation that "pushdown" planner emposes, but not "router". So, with this PR, we also allow those.

Closes #6772
DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries that target a single-shard group

@tejeswarm
Copy link
Contributor

@onderkalaci "The absence of JOIN restriction information in PG16 results in the Citus planner's inability to recognize the potential of pushdown operations for single shard queries, leading to alternative planning. All multi-shard queries we are covered and Citus does recognize the pushdown opportunity?", is that a fair statement?

@onderkalaci
Copy link
Member Author

@onderkalaci "The absence of JOIN restriction information in PG16 results in the Citus planner's inability to recognize the potential of pushdown operations for single shard queries, leading to alternative planning. All multi-shard queries we are covered and Citus does recognize the pushdown opportunity?", is that a fair statement?

Yes, that's fair. Also note that other than INSERT .. SELECT (and probably merge), we already have the alternative planning (e.g., single shard router planning). So, other code-paths are not impacted by PG16.

I think maybe one more note here is that in Citus' planning logic, the single shard router kicks before the multi-shard pushdown. So, for non-INSERT..SELECT cases, the behavior has not changed anyway. The filters on the tables are already present on PG15-.

@onurctirtir
Copy link
Member

Does that replace #6772?

@onderkalaci
Copy link
Member Author

Does that replace #6772?

yes

@@ -690,15 +727,16 @@ DistributedInsertSelectSupported(Query *queryTree, RangeTblEntry *insertRte,
}

/* first apply toplevel pushdown checks to SELECT query */
DeferredErrorMessage *error = DeferErrorIfUnsupportedSubqueryPushdown(subquery,
DeferredErrorMessage *error = routerSelect ? NULL :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just have an if (!routerSelect) block around these error checks

@@ -1398,6 +1398,7 @@ INSERT INTO nullkey_c1_t1 SELECT * FROM nullkey_c1_t2;
SET client_min_messages TO DEBUG2;
-- between two non-colocated single-shard tables
INSERT INTO nullkey_c1_t1 SELECT * FROM nullkey_c2_t1;
DEBUG: Creating router plan
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onurctirtir can you please review the test output changes and let me know if there are any unexpected changes.

I already did a pass, but would be nice to get a review from you.

Also please check insert_select_single_shard.out changes as well

@@ -28,9 +28,11 @@ SELECT create_distributed_table('distributed_table','key');
(1 row)

INSERT INTO distributed_table SELECT *,* FROM generate_series(20, 40);
DEBUG: Creating router plan
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have lots of new Creating router plan logs. I was not sure whether we should consider omitting it.

But I decided to keep these are it is really what happens underlying.

@naisila
Copy link
Member

naisila commented Jul 27, 2023

Note: This patch only fixes the multi_insert_select failure from this list #7017

Fix complex joins are only supported when all distributed tables are co-located and joined on their distribution columns, seen in multi_insert_select

@onderkalaci
Copy link
Member Author

Note: This patch only fixes the multi_insert_select failure from this list #7017

Fix complex joins are only supported when all distributed tables are co-located and joined on their distribution columns, seen in multi_insert_select

thanks for verifying, that was my intention as well

Tradionally our planner works in the following order:
   router - > pushdown -> repartition -> pull to coordinator

However, for INSERT .. SELECT commands, we did not support "router".

In practice, that is not a big issue, because pushdown planning
can handle router case as well.

However, with PG 16, certain outer joins are converted to
JOIN without any conditions (e.g., JOIN .. ON (true)) and
the filters are pushed down to the tables.

When the filters are pushed down to the tables, router planner
can detect. However, pushdown planner relies on JOIN conditions.

An example query:
```
INSERT INTO agg_events (user_id)
        SELECT raw_events_first.user_id
        FROM raw_events_first LEFT JOIN raw_events_second
        	ON raw_events_first.user_id = raw_events_second.user_id
        WHERE raw_events_first.user_id = 10;
```

As a side effect of this change, now we can also relax
certain limitation that "pushdown" planner emposes, but not
"router". So, with this PR, we also allow those.
Copy link
Member

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the flaky tests.

I think we should add the following to the PR description:

Closes https://github.com/citusdata/citus/pull/6772.

And the following changelog item like we did in #6752.

DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries that target a single-shard group

(Omitting the effect on PG16 support because no Citus versions had support for PG16 so far)

@@ -61,6 +61,8 @@ static DistributedPlan * CreateInsertSelectPlanInternal(uint64 planId,
static DistributedPlan * CreateDistributedInsertSelectPlan(Query *originalQuery,
PlannerRestrictionContext *
plannerRestrictionContext);
static bool InsertSelectRouter(Query *originalQuery,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name this as InsertSelectIsSelectRouter or such to emphasize that this doesn't imply that the whole query is router but the select part.

Comment on lines -1926 to -1934
-- We could relax distributed insert .. select checks to allow pushing
-- down more clauses down to the worker nodes when inserting into a single
-- shard by selecting from a colocated one. We might want to do something
-- like https://github.com/citusdata/citus/pull/6772.
--
-- e.g., insert into null_shard_key_1/citus_local/reference
-- select * from null_shard_key_1/citus_local/reference limit 1
--
-- Below "limit / offset clause" test and some others are examples of this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of those comments made me feel very happy :)

@onderkalaci onderkalaci enabled auto-merge (squash) July 28, 2023 12:06
@onderkalaci onderkalaci merged commit cb5eb73 into main Jul 28, 2023
26 checks passed
@onderkalaci onderkalaci deleted the router_insert_select branch July 28, 2023 12:07
onderkalaci added a commit that referenced this pull request Jul 28, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Jul 28, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Jul 28, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Jul 31, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Aug 1, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Aug 2, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Aug 4, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information
for certain outer joins, MERGE is also impacted given
that is is also underlying an outer join.

See #7077 for the details.
onderkalaci added a commit that referenced this pull request Aug 4, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information for certain outer
joins, MERGE is also impacted given that is is also underlying an outer
join.

See #7077 for the details.
naisila pushed a commit that referenced this pull request Aug 4, 2023
Similar to #7077.

As PG 16+ has changed the join restriction information for certain outer
joins, MERGE is also impacted given that is is also underlying an outer
join.

See #7077 for the details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants