-
-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Labels
data-validationIssues related to checking whether data meets our quality expectations.Issues related to checking whether data meets our quality expectations.dbtIssues related to the data build tool aka dbtIssues related to the data build tool aka dbttestingWriting tests, creating test data, automating testing, etc.Writing tests, creating test data, automating testing, etc.
Description
Summary
Toplevel validation migration spreadsheet for reference.
There are four group mean continuity checks we want to migrate into dbt:
table | python source |
---|---|
_core_eia923__cooling_system_information |
pudl/transform/eia923.py:cooling_system_information_continuity() |
_core_eia923__fgd_operation_maintenance |
pudl/transform/eia923.py:fgd_continuity_check() |
_core_eia860__fgd_equipment |
pudl/transform/eia860.py:fgd_equipment_continuity() |
_core_eia860__cooling_equipment |
pudl/transform/eia860.py:cooling_equipment_continuity() |
All four ultimately call validate.py:group_mean_continuity_check().
Current status
There is a draft implementation in #4092, which includes a demo instance of the test for _core_eia923__cooling_system_information
. The maximum acceptable number of outliers for this instance of the test is 2.
- In dbt, the test fails with 4 outliers.
- In dagster, this test succeeds, finding only 2 outliers.
- In a python notebook pointed at the output parquet files, the test fails, finding the same 4 outliers dbt finds.
Next steps
Figure out why dagster doesn't find the other two outliers but a python notebook can.
Metadata
Metadata
Assignees
Labels
data-validationIssues related to checking whether data meets our quality expectations.Issues related to checking whether data meets our quality expectations.dbtIssues related to the data build tool aka dbtIssues related to the data build tool aka dbttestingWriting tests, creating test data, automating testing, etc.Writing tests, creating test data, automating testing, etc.
Type
Projects
Status
Icebox