Skip to content

Migrate group_mean_continuity_check validation tests to dbt #4095

@aesharpe

Description

@aesharpe

Summary

Toplevel validation migration spreadsheet for reference.

There are four group mean continuity checks we want to migrate into dbt:

table python source
_core_eia923__cooling_system_information pudl/transform/eia923.py:cooling_system_information_continuity()
_core_eia923__fgd_operation_maintenance pudl/transform/eia923.py:fgd_continuity_check()
_core_eia860__fgd_equipment pudl/transform/eia860.py:fgd_equipment_continuity()
_core_eia860__cooling_equipment pudl/transform/eia860.py:cooling_equipment_continuity()

All four ultimately call validate.py:group_mean_continuity_check().

Current status

There is a draft implementation in #4092, which includes a demo instance of the test for _core_eia923__cooling_system_information. The maximum acceptable number of outliers for this instance of the test is 2.

  • In dbt, the test fails with 4 outliers.
  • In dagster, this test succeeds, finding only 2 outliers.
  • In a python notebook pointed at the output parquet files, the test fails, finding the same 4 outliers dbt finds.

Next steps

Figure out why dagster doesn't find the other two outliers but a python notebook can.

Metadata

Metadata

Assignees

Labels

data-validationIssues related to checking whether data meets our quality expectations.dbtIssues related to the data build tool aka dbttestingWriting tests, creating test data, automating testing, etc.

Type

No type

Projects

Status

Icebox

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions