-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for router INSERT .. SELECT commands #7077
Conversation
@onderkalaci "The absence of JOIN restriction information in PG16 results in the Citus planner's inability to recognize the potential of pushdown operations for single shard queries, leading to alternative planning. All multi-shard queries we are covered and Citus does recognize the pushdown opportunity?", is that a fair statement? |
Yes, that's fair. Also note that other than INSERT .. SELECT (and probably merge), we already have the alternative planning (e.g., single shard router planning). So, other code-paths are not impacted by PG16. I think maybe one more note here is that in Citus' planning logic, the single shard router kicks before the multi-shard pushdown. So, for non-INSERT..SELECT cases, the behavior has not changed anyway. The filters on the tables are already present on PG15-. |
Does that replace #6772? |
yes |
@@ -690,15 +727,16 @@ DistributedInsertSelectSupported(Query *queryTree, RangeTblEntry *insertRte, | |||
} | |||
|
|||
/* first apply toplevel pushdown checks to SELECT query */ | |||
DeferredErrorMessage *error = DeferErrorIfUnsupportedSubqueryPushdown(subquery, | |||
DeferredErrorMessage *error = routerSelect ? NULL : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just have an if (!routerSelect)
block around these error checks
ce3eff4
to
9bd9770
Compare
@@ -1398,6 +1398,7 @@ INSERT INTO nullkey_c1_t1 SELECT * FROM nullkey_c1_t2; | |||
SET client_min_messages TO DEBUG2; | |||
-- between two non-colocated single-shard tables | |||
INSERT INTO nullkey_c1_t1 SELECT * FROM nullkey_c2_t1; | |||
DEBUG: Creating router plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onurctirtir can you please review the test output changes and let me know if there are any unexpected changes.
I already did a pass, but would be nice to get a review from you.
Also please check insert_select_single_shard.out changes as well
@@ -28,9 +28,11 @@ SELECT create_distributed_table('distributed_table','key'); | |||
(1 row) | |||
|
|||
INSERT INTO distributed_table SELECT *,* FROM generate_series(20, 40); | |||
DEBUG: Creating router plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have lots of new Creating router plan
logs. I was not sure whether we should consider omitting it.
But I decided to keep these are it is really what happens underlying.
9bd9770
to
e90557e
Compare
Note: This patch only fixes the Fix |
thanks for verifying, that was my intention as well |
e90557e
to
74ced12
Compare
Tradionally our planner works in the following order: router - > pushdown -> repartition -> pull to coordinator However, for INSERT .. SELECT commands, we did not support "router". In practice, that is not a big issue, because pushdown planning can handle router case as well. However, with PG 16, certain outer joins are converted to JOIN without any conditions (e.g., JOIN .. ON (true)) and the filters are pushed down to the tables. When the filters are pushed down to the tables, router planner can detect. However, pushdown planner relies on JOIN conditions. An example query: ``` INSERT INTO agg_events (user_id) SELECT raw_events_first.user_id FROM raw_events_first LEFT JOIN raw_events_second ON raw_events_first.user_id = raw_events_second.user_id WHERE raw_events_first.user_id = 10; ``` As a side effect of this change, now we can also relax certain limitation that "pushdown" planner emposes, but not "router". So, with this PR, we also allow those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the flaky tests.
I think we should add the following to the PR description:
Closes https://github.com/citusdata/citus/pull/6772.
And the following changelog item like we did in #6752.
DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries that target a single-shard group
(Omitting the effect on PG16 support because no Citus versions had support for PG16 so far)
@@ -61,6 +61,8 @@ static DistributedPlan * CreateInsertSelectPlanInternal(uint64 planId, | |||
static DistributedPlan * CreateDistributedInsertSelectPlan(Query *originalQuery, | |||
PlannerRestrictionContext * | |||
plannerRestrictionContext); | |||
static bool InsertSelectRouter(Query *originalQuery, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should name this as InsertSelectIsSelectRouter or such to emphasize that this doesn't imply that the whole query is router but the select part.
-- We could relax distributed insert .. select checks to allow pushing | ||
-- down more clauses down to the worker nodes when inserting into a single | ||
-- shard by selecting from a colocated one. We might want to do something | ||
-- like https://github.com/citusdata/citus/pull/6772. | ||
-- | ||
-- e.g., insert into null_shard_key_1/citus_local/reference | ||
-- select * from null_shard_key_1/citus_local/reference limit 1 | ||
-- | ||
-- Below "limit / offset clause" test and some others are examples of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of those comments made me feel very happy :)
74ced12
to
05b97fd
Compare
Tradionally our planner works in the following order:
router - > pushdown -> repartition -> pull to coordinator
However, for INSERT .. SELECT commands, we did not support "router".
In practice, that is not a big issue, because pushdown planning can handle router case as well.
However, with PG 16, certain outer joins are converted to JOIN without any conditions (e.g., JOIN .. ON (true)) and the filters are pushed down to the tables.
When the filters are pushed down to the tables, router planner can detect. However, pushdown planner relies on JOIN conditions.
An example query:
As a side effect of this change, now we can also relax certain limitation that "pushdown" planner emposes, but not "router". So, with this PR, we also allow those.
Closes #6772
DESCRIPTION: Prevents unnecessarily pulling the data into coordinator for some INSERT .. SELECT queries that target a single-shard group