Skip to content

Explain strange nulls #4442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 22, 2025
Merged

Explain strange nulls #4442

merged 12 commits into from
Jul 22, 2025

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Jul 17, 2025

Overview

Closes #4407

However, note that there are a couple of issues in #4407 that I think warrant some discussion.

Testing

  • I re-ran the dbt tests locally on my full ETL output just to make sure not_all_null.sql wasn't being used somewhere.

To-do list

  • Review the PR yourself and call out any questions or issues you have.

@zaneselvans zaneselvans self-assigned this Jul 17, 2025
@zaneselvans zaneselvans added testing Writing tests, creating test data, automating testing, etc. dbt Issues related to the data build tool aka dbt data-validation Issues related to checking whether data meets our quality expectations. labels Jul 17, 2025
@zaneselvans zaneselvans moved this from New to In progress in Catalyst Megaproject Jul 17, 2025
Comment on lines 8 to 12
description: >
This table has a lot of row_conditions because it has a few columns that
are found in the EIA-860M generators table (and so appear in the most),
recent monthly data) but most of the columns are specific to boilers,
which are not reported in the EIA-860M.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this table is included and has a lot of row conditions is a bit of an artifact of the fact that the row_conditions were generated before I created the ignore_eia860m_nulls argument. I chose to leave them in (even though I think the table would not have any failures in the fast ETL now) because I think we'll want to deploy this test more widely and it encodes real expectations about which column-years should be null.

@zaneselvans zaneselvans requested a review from jdangerx July 17, 2025 18:32
@zaneselvans zaneselvans moved this from In progress to In review in Catalyst Megaproject Jul 17, 2025
@zaneselvans zaneselvans changed the title Strange nulls Explain strange nulls Jul 17, 2025
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, we'll be happy about these docs in a few months when we have to figure out what's going on :)

Thanks for putting so many signposts for the more trivial changes too - helped streamline the review process.

A single typo fix + a few non-blocking questions remain.

@zaneselvans zaneselvans added this pull request to the merge queue Jul 21, 2025
@zaneselvans zaneselvans removed this pull request from the merge queue due to a manual request Jul 21, 2025
@zaneselvans
Copy link
Member Author

zaneselvans commented Jul 21, 2025

Whooops, found a bug having to do with the dtype of report_year -- it's a string, not a date or year! wtf! Sorry paranoia on my part

@zaneselvans zaneselvans added this pull request to the merge queue Jul 21, 2025
Merged via the queue into main with commit 9806ed6 Jul 22, 2025
18 checks passed
@zaneselvans zaneselvans deleted the strange-nulls branch July 22, 2025 00:15
@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-validation Issues related to checking whether data meets our quality expectations. dbt Issues related to the data build tool aka dbt testing Writing tests, creating test data, automating testing, etc.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Investigate unusual patterns of column nulls
2 participants