You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per a review comment, the rebuild-bigquery command should have the ability to incrementally update the BigQuery table with fields that only exist in the Postgres table, without having to copy the entire table over every time.
One potential fix is to pull down the list of primary keys from the BigQuery table as a filtering mechanism.
We're looking at 1,727,065 rows * 16 bytes (per md5 hash), or about 30 mb to do the initial filtering. It would make this function a bit more complicated. I imagine the proper way to do this is to do something like:
WITH bigquery_hashes as (...)
SELECT*FROM build
LEFT JOIN bigquery_hashes
USING (build_hash)
WHERE build_hash is NULL
and then iterating over the results of this query to insert into the table. I'd prefer to leave as future work (i.e. out of scope of this PR), but this would be a good thing to have and schedule periodically.
The text was updated successfully, but these errors were encountered:
As per a review comment, the
rebuild-bigquery
command should have the ability to incrementally update the BigQuery table with fields that only exist in the Postgres table, without having to copy the entire table over every time.One potential fix is to pull down the list of primary keys from the BigQuery table as a filtering mechanism.
The text was updated successfully, but these errors were encountered: