Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update or supplement rebuild-bigquery command for incremental updates #618

Open
acmiyaguchi opened this issue Feb 6, 2020 · 0 comments
Open

Comments

@acmiyaguchi
Copy link
Contributor

As per a review comment, the rebuild-bigquery command should have the ability to incrementally update the BigQuery table with fields that only exist in the Postgres table, without having to copy the entire table over every time.

One potential fix is to pull down the list of primary keys from the BigQuery table as a filtering mechanism.

We're looking at 1,727,065 rows * 16 bytes (per md5 hash), or about 30 mb to do the initial filtering. It would make this function a bit more complicated. I imagine the proper way to do this is to do something like:

WITH bigquery_hashes as (...)
SELECT *
FROM build
LEFT JOIN bigquery_hashes
USING (build_hash)
WHERE build_hash is NULL

and then iterating over the results of this query to insert into the table. I'd prefer to leave as future work (i.e. out of scope of this PR), but this would be a good thing to have and schedule periodically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant