Skip to content

Add NRELATB data source page #4396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 21, 2025
Merged

Add NRELATB data source page #4396

merged 27 commits into from
Jul 21, 2025

Conversation

cmgosnell
Copy link
Member

Overview

Closes #4373.

What problem does this address?

no page!

What did you change?

added page! plus added 2024 html definitions

Documentation

Make sure to update relevant aspects of the documentation:

  • Update relevant Data Source jinja templates (see docs/data_sources/templates).

Testing

How did you make sure this worked? How can a reviewer verify this?
make docs-build -> eyeball the built docs page

To-do list

  • If updating analyses or data processing functions: make sure to update row count expectations in dbt tests.
  • Run make pytest-coverage locally to ensure that the merge queue will accept your PR.
  • Review the PR yourself and call out any questions or issues you have.
  • For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
  • For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.
  • Alternatively, run the build-deploy-pudl GitHub Action manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried loading some of these saved HTML documents and unlike the newer blank FERC forms (which come as HTML) they don't seem to work very well as stand-alone documents -- lots of links, but they're broken. The tabular formatting doesn't seem to have carried across, etc. I wonder if we might want to save ("print") these definitions pages as PDFs instead?

@krivard krivard self-requested a review July 11, 2025 13:54
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly optional stuff, one major confusion area

Comment on lines 71 to 72
primary keys. Subsets of the ``core_metric_parameter`` have unique values across the
data given specific primary keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This is probably too compressed, and needs more words in it to make it clear what all the parts mean. Or maybe an example? or both.

  • Subsets how? a subset of the rows sharing a metric parameter? a subset of the available metric parameters? a subset of the columns (in the pudl sense) within the rows sharing a metric parameter?
  • Values of the parameters or of the primary keys?
  • Does specific primary keys mean a set of primary keys from the previous sentence, so you have "given primary keys [collection of columns], ..."? or does it mean specific values in the primary key columns, so you have "given primary keys [row selection criteria], ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i was worried about this not being clear.. i will try to flesh it out and check to see if it helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i reworked this a fair amount to just explain the original skinny to multiple wider tables. the thing i was tryyying to get at before was why we had to break the skinny table into multiple wider tables, which is to say specific values of core_metric_parameter pertained to different sets of primary keys. it was a little tedious to figure out which core_metric_parameter should be pulled into which of the final pudl tables bc it was a guessing game of "are you unique based on this set of PKs? if yes great, if no try another set, etc etc". I coped this directly from the transform docs and tbh i think that weirdness makes more sense over there than actually explained to a user.

@github-project-automation github-project-automation bot moved this from New to In progress in Catalyst Megaproject Jul 11, 2025
Kathryn Mazaitis and others added 20 commits July 11, 2025 15:35
Add EPA CAMD - EIA crosswalk data source page
* feat: add dagster-dbt translation functionality

* docs: update typos, add a little more explanation

remove extraneous project prefix in dbt selectors.
* Update conda-lock.yml and re-render conda environment files. Autoupdate pre-commit hooks.

* Bump gdal pin to 3.11.3

---------

Co-authored-by: zaneselvans <[email protected]>
Co-authored-by: Zane Selvans <[email protected]>
* Add new expect_columns_not_all_null table-level dbt test.

* Add new expect_columns_not_all_null test to all dbt models.

* Exclude some actually all null columns.

* Add more information to the failure outputs.

* Remove remaining no-null-cols validation tests.

* Rename id_dc_coupled_tightly to is_dc_coupled_tightly

* Fix conditional_columns checks to pass when zero records are selected.

* Add no-null-rows fast ETL conditions for _core_eia860__cooling_equipment

* Add no-null-rows fast ETL conditions for _core_eia860__fgd_equipment

* move so2 equipment into excluded columns

* Add no-null-rows fast ETL conditions for _core_eia923__cooling_system_information

* Update dbt/tests/data_tests/generic_tests/expect_columns_not_all_null.sql

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Clean up / simplify not all null test a little.

* Fix enum for NREL tax case (#4384)

* fix enum for nrelatb tax case

* Merge alembic migration heads.

---------

Co-authored-by: Zane Selvans <[email protected]>

* Add exceptions to no-null-cols test for _core_eia923__fgd_operation_maintenance

* Add exceptions to expect_columns_not_all_null test for core_eia860__assn_boiler_generator table

* Add exceptions to expect_columns_not_all_null test for core_eia860__assn_boiler_stack_flue table

* Add exceptions to expect_columns_not_all_null test for core_eia860__scd_generators_solar table

* Add exceptions to expect_columns_not_all_null test for core_eia860__scd_generators_wind table

* Add exceptions to expect_columns_not_all_null test for out_eia923__fuel_receipts_costs table

* Add exceptions to expect_columns_not_all_null for remaining tables failing fast ETL

* Remove pinned dbt version now that regression has been fixed.

* Remove dagster asset checks superseded by expect_columns_not_all_null

* Docstring cleanup.

* Add comment, shorten pudl.validate to pv

* Add missing boolean columns to EIA-860 multiful transform.

* Add exhaustive null blocks into SCD table expect_columns_not_all_null tests

* Change pandera import to be pandas specific.

* consolidate duplicate operating_switch and can_switch_when_oeprating columns

* Add special cases to expect_columns_not_all_null for core_eia860__scd_generators_multifuel table.

* Add special cases for expect_columns_not_all_null test in coalmine and boiler entity tables

* Relock conda dependencies

* Exclude static forward/backward filled boiler manufacturer fields from no-null-cols test.

* Add description explaining why certain columns are excluded in yearly boiler table.

* Relock dependencies

* Rename conditional_columns to row_conditions

* Fix name of columns_are_close test in docs file.

* relock dependencies

* Add descriptions to explain special cases in no-null-cols tests.

* Fix rolling average description wording.

* Add release notes about no-null-cols checks.

* Dynamically exclude null column checks related to lack of EIA-860M coverage

* Dynamically exclude null column checks related to lack of EIA-860M coverage

* Remove final reference to pv.no_null_cols() and the function itself.

* Deal with possibility of 1 or 2 EIA-860M only years.

* Simplify logic and document report_date requirement

* Add a real script for identifying null column row conditions

* Add explicit help options.

* Revert dependency updates to whatever is on main

* Update stale docstring and dependencies b/c pandera imports

* Remove log-level option

* Remove log-level option

* Re-lock dependencies based on main

* Add unit tests for pudl_null_cols script.A

* Update dbt/tests/data_tests/generic_tests/schema.yml

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update docs/release_notes.rst

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Update test/unit/scripts/test_pudl_null_cols.py

Co-authored-by: Kathryn Mazaitis <[email protected]>

* Address comments from code review.

---------

Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: Christina Gosnell <[email protected]>
* wip first very draft of table name descriptor

* slight clean up of wip table name metadata

* Incorporate metadata checker and table name data extractor.
Lightly-revise table name data stubs for use in templating.
Includes minimal seed data for testing.
Includes new 'label' attribute for DataSource metadata.

* Resource metadata: cosmetic adjustments for preview wizard

* Adjust templates for wizard

* split shared text from prompts

* Remove outdated 861 variance

* Add table description rendering step

* Add usage warnings

* Add command line access to default table description templates; fix corner cases

* Add table description build phase; support structured table description metadata in type checks.

* Fully-decomposed metadata build

* Update core_eia923__monthly_boiler_fuel with new decomposed table metadata

* Adapt new metadata build for previews

* Add support for quarterly; make descriptions render acceptably for unmigrated tables

* Revert sample migrations

* Handle usage warnings defaults at build time; skip usage warnings display if empty

* Permit the sphinx build to override default table description template location

* Try moving the table description template into the primary source tree

* Drop docs_dir args that are no longer needed

* Add docstrings to description rendering code

* Refactor with more sensible layout, naming, and documentation

* Add missing files

* replace lockfiles with copies from main.
(we're not adding any new libraries so this should be fine?)

* Fix typos :D

* Fix docs build; fix missed escapes in description template; use more consistent vocabulary

* Fix API doc formatting and improve test coverage.
- Remove extra indentation causing spurious blockquotes
- Migrate one source so we can exercise the rest of the tests
- Add test for resource_description console script, and fix some confusing naming
- Fix bad field names in metadata test

* Add flag to turn metadata checks for primary key descriptions on/off

* Revisions from review.

* Switch to more consistent and informative key naming scheme
* Drop vestigal Datasource.label
* Clarify string aggregations
* Fix spacing more

Co-authored-by: E. Belfer <[email protected]>

* Fix rst in docstrings

* Permit nonstandard table types if you have additional_summary_text set

* typo

* Fix docs

* Docs and test improvements from review.

Co-authored-by: Dazhong Xia <[email protected]>

---------

Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: E. Belfer <[email protected]>
Co-authored-by: Dazhong Xia <[email protected]>
* Update DOIs

* Update release notes
* fix: make yaml output indent lists properly for prettier

* chore: make diff output a little nicer to read

* fix: using __ within a class no work with module level function

* docs: update docstring with info about prettier jankiness

* docs: update release notes

* chore: switch to click.secho instead of logger

only stop propagation of logger.parent in this script.
jdangerx and others added 2 commits July 18, 2025 10:56
* wip: only run integration tests for non-doc changes

for testing purposes, not restricting CI to merge-group only.

* chore: re-add the merge_group condition, add comment

* chore: realize that skipped jobs count as passing for branch protection

* revert-later: remove merge_group filter for testing

* Revert "revert-later: remove merge_group filter for testing"

This reverts commit 50d3c33.

* docs: update release notes

* fix: add some other docs files; don't skip unit tests

* Update docs/release_notes.rst

Co-authored-by: Zane Selvans <[email protected]>

---------

Co-authored-by: Zane Selvans <[email protected]>
@cmgosnell cmgosnell requested a review from krivard July 18, 2025 19:41
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lil typos and then good to go!

Also 😟 so sorry I apparently merged the epacamd source page into this branch instead of main?? Made a mess. A temporary mess but still. Sorry.

@cmgosnell cmgosnell enabled auto-merge July 21, 2025 15:33
@cmgosnell cmgosnell added this pull request to the merge queue Jul 21, 2025
Merged via the queue into main with commit d0a694f Jul 21, 2025
18 checks passed
@cmgosnell cmgosnell deleted the nrelatb-data-source-page branch July 21, 2025 16:51
@github-project-automation github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add data source documentation for NREL ATB Electricity
6 participants