-
Notifications
You must be signed in to change notification settings - Fork 173
feat(incremental): copy multiple tables in parallel (#1237) #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(incremental): copy multiple tables in parallel (#1237) #1413
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a maintainer but this is a much needed feature, thanks @AxelThevenot 🙌🏻
Large datasets with a lot of partitions currently performs quite poorly.
I looked into doing this directly with one call copy_table
call but it doesn't look like it's possible currently (unless we delete the partitions upfront and use write_append)
@mikealfare @colin-rogers-dbt could we get your thoughts / a review of this one? |
Is there any progress here? |
This would help a lot |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: axel_thevenot.
|
ece2ad3
to
b5336ae
Compare
looks very helpful for large incremental models, would love to see this merged |
@leohoare |
@andreypanchenko I'm not a maintainer, I'll post it in dbt-bigquery slack to rally some attention. |
Hey folks, I've got eyes on this and taking some extra time to get this merged on the side (we're all hands on deck for other things ahead of 5/28 this month, but I know how important this change is, so I'm taking extra work hours to shepherd it along -- with that context in mind, please bear with me 🙇♀️ ) I need to trace over the code and get a solid idea of what's happening at each step since this has a pretty big surface area and add tests for any behaviors this alters or introduces that aren't already tested. I'll parse over this / do research on the changes tomorrow with a colleague so we can fast track the change as much as possible (two minds better than one for this sort of thing). Also, we're going to need to do this merge over in dbt-adapters. @AxelThevenot either I'll do that later this week or you can get the jump on me by doing it sooner. Either way. |
@@ -18,23 +18,27 @@ | |||
|
|||
{% macro bq_copy_partitions(tmp_relation, target_relation, partitions, partition_by) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal dbt convo: plan to leave the the macro itself for the user interface (no breaking changes) but we figured this control flow can (and should) be pushed down into Python
{% endif %} | ||
{% set tmp_relation_partitioned = api.Relation.create(database=tmp_relation.database, schema=tmp_relation.schema, identifier=tmp_relation.table ~ '$' ~ partition, type=tmp_relation.type) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: partition fqns now handled in Python (see copy_bq_table)
@@ -409,7 +409,7 @@ def _agate_to_schema( | |||
return bq_schema | |||
|
|||
@available.parse(lambda *a, **k: "") | |||
def copy_table(self, source, destination, materialization): | |||
def copy_table(self, source, destination, materialization, partition_ids=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt internal convo: double check no issues with base adapter (don't think so but we have to check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VersusFacit
Is there any news on it?
any update on this? @AxelThevenot |
resolves dbt-labs/dbt-adapters#559
N/A dbt-labs/docs.getdbt.com/#
Problem
Copy partitions and tables in parallel instead of sequentially which is slow for large partition management
Solution
Run jobs in parallel and waits for the results.
Checklist