Skip to content

Update row count expectations with clobber after nightly build. #4346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2025

Conversation

zaneselvans
Copy link
Member

Overview

After the Google Batch ETL is finished, use dbt_helper to update row count expectations, so we can save them to builds.catalyst.coop and apply them to the branch, w/o needing to run the ETL locally or worrying about something being stale.

Tried to do this in #4341 but it only looked like it worked because I tested it by removing a line from the row count CSV -- it won't update existing row counts without --clobber.

I ran a branch deployment off of the the 88888/99999 PR branch #4291 with this PR #4345 and it updated the row counts for a bunch of different tables/partitions.

@zaneselvans zaneselvans requested a review from e-belfer June 24, 2025 17:35
@zaneselvans zaneselvans self-assigned this Jun 24, 2025
@zaneselvans zaneselvans added testing Writing tests, creating test data, automating testing, etc. nightly-builds Anything having to do with nightly builds or continuous deployment. labels Jun 24, 2025
@zaneselvans zaneselvans added dbt Issues related to the data build tool aka dbt data-validation Issues related to checking whether data meets our quality expectations. labels Jun 24, 2025
@zaneselvans zaneselvans moved this from New to In review in Catalyst Megaproject Jun 24, 2025
FROM mambaorg/micromamba:2.1.1-ubuntu24.04
FROM mambaorg/micromamba:2.3.0-ubuntu24.04
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidental update -- just noticed it was stale when I was poking around in the nightly build script.

Comment on lines +257 to +259
# Generate new row counts for all tables in the PUDL database
dbt_helper update-tables --clobber --row-counts all 2>&1 | tee -a "$LOGFILE"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to run AFTER the ETL and not in the midst of it otherwise the ETL will check against the wrong set of row counts.

@zaneselvans zaneselvans enabled auto-merge June 24, 2025 17:37
@zaneselvans zaneselvans added this pull request to the merge queue Jun 24, 2025
Merged via the queue into main with commit 065af38 Jun 24, 2025
17 checks passed
@zaneselvans zaneselvans deleted the clobber-row-counts-nightly branch June 24, 2025 19:09
@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-validation Issues related to checking whether data meets our quality expectations. dbt Issues related to the data build tool aka dbt nightly-builds Anything having to do with nightly builds or continuous deployment. testing Writing tests, creating test data, automating testing, etc.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants