Skip to content

Metadata migration for EIA923 #4422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Metadata migration for EIA923 #4422

wants to merge 49 commits into from

Conversation

cmgosnell
Copy link
Member

Overview

Closes #4400

What problem does this address?

What did you change?

Documentation

Make sure to update relevant aspects of the documentation:

  • Update the release notes: reference the PR and related issues.
  • Update relevant Data Source jinja templates (see docs/data_sources/templates).
  • Update relevant table or source description metadata (see src/metadata).
  • Review and update any other aspects of the documentation that might be affected by this PR.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

  • If updating analyses or data processing functions: make sure to update row count expectations in dbt tests.
  • Run make pytest-coverage locally to ensure that the merge queue will accept your PR.
  • Review the PR yourself and call out any questions or issues you have.
  • For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
  • For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.
  • Alternatively, run the build-deploy-pudl GitHub Action manually.

cmgosnell and others added 30 commits April 1, 2025 16:36
Lightly-revise table name data stubs for use in templating.
Includes minimal seed data for testing.
Includes new 'label' attribute for DataSource metadata.
(we're not adding any new libraries so this should be fine?)
@cmgosnell cmgosnell self-assigned this Jul 15, 2025
@cmgosnell cmgosnell moved this from New to In progress in Catalyst Megaproject Jul 15, 2025
@cmgosnell cmgosnell added eia923 Anything having to do with EIA Form 923 docs Documentation for users and contributors. labels Jul 15, 2025
Comment on lines +24 to +26
Documentation
^^^^^^^^^^^^^
* Migrated table description metadata into new format for EIA 923. See :issue:`4400`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably add a note in here before the dataset specific migrations about the whole migration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Also might be a good idea to highlight the table name changes specifically.

@krivard
Copy link
Contributor

krivard commented Jul 15, 2025

Hmmm sphinx can't find your references for _core_eia923__monthly_cooling_system_information; are the docs still using the old name?

Base automatically changed from metadata-table-name-based to main July 15, 2025 20:10
@cmgosnell
Copy link
Member Author

Hmmm sphinx can't find your references for _core_eia923__monthly_cooling_system_information; are the docs still using the old name?

I've learned that in order to :ref: a table name that starts with an underscore you need to put an i in front of it?!?

Comment on lines +49 to +53

`make docs-build` will build and the delete all of the rst files via
``cleanup_rsts`` and ``cleanup_csv_dir`` in ``docs/conf.py``. If you want to
preserve them for a one-off build, you can comment out that step in
``docs/conf.py``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added this guy to hopefully help debug warnings like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to point folks at the autoapi_keep_files variable in conf.py for debugging problems in any of the docstrings, since that's separate from our custom cleanup functions.

I think the note above "If you create a new module, the corresponding documentation file will also need to be checked in to version control." is incorrect, and dates from a time when we had all these generated files checked into git.

Comment on lines -768 to +778
:ref:`i_core_eia923__fgd_operation_maintenance` and
:ref:`i_core_eia923__yearly_fgd_operation_maintenance` and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how i learned it needs to start with an i_core..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never heard of this? But I trust?

@cmgosnell cmgosnell moved this from In progress to In review in Catalyst Megaproject Jul 17, 2025
Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done! Just publishing this bit because I need to switch computers.

Comment on lines +24 to +26
Documentation
^^^^^^^^^^^^^
* Migrated table description metadata into new format for EIA 923. See :issue:`4400`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Also might be a good idea to highlight the table name changes specifically.

Comment on lines -768 to +778
:ref:`i_core_eia923__fgd_operation_maintenance` and
:ref:`i_core_eia923__yearly_fgd_operation_maintenance` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never heard of this? But I trust?

f"{KNOWN_DRAWBACKS_DESCRIPTION}"
),
"description": {
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these to be all uppercased bc it's a description, not a title (same goes for all the rest below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure what "downscaled" means in this context. It might make more sense to just use the first sentence of the old description: "estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to have "estimated" in here somewhere

),
"description": {
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.",
"additional_source_text": "(Schedule 3)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this needs to be in parenthesis here. We should decide on a standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the parenthesis, this displays as

EIA Form 923 -- Power Plant Operations Report (Schedule 3)

Which I do like best out of the alternatives,

EIA Form 923 -- Power Plant Operations Report Schedule 3

EIA Form 923 -- Power Plant Operations Report , Schedule 3

(with the space between Report and the comma, unfortunately)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. There are some cases where there is more than just a schedule to add to the source, but I think we can probably put it all in parenthesis.

"reported in the EIA-923 generation fuel are allocated to individual "
"generators. Then, these allocations are aggregated to unique generator, "
f"prime mover, and energy source code combinations. "
f"{KNOWN_DRAWBACKS_DESCRIPTION}"
Copy link
Member

@aesharpe aesharpe Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should any of these known drawbacks be usage warnings?

Comment on lines +80 to +83
"additional_details_text": (
f"{freq.title()} estimated net generation and fuel consumption by generator. "
"Based on allocating net electricity generation and fuel consumption reported "
"in the EIA-923 generation and generation_fuel tables to individual generators."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have some way of indicating whether a table has been altered significantly by us. For instance, the estimated net generation is calculated by us, not EIA. We could mention this in additional_source_text or in additional_details_text maybe? Or maybe elaborate on the estimated_values usage warning detail text?

Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay Part 2: I didn't go into too much detail for the sake of this being a migration vs. editing pass, but one high level thought I have is this:

In my opinion, the additionally_summary_text should read more like a topic sentence and less like a title. Some of the ones you have in here feel like they need just a little bit more information in them--I think one thing you could do is take the first sentence of the additional_details_text and use that instead, or something similar. What do you think?

),
"description": {
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.",
"additional_source_text": "(Schedule 3)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. There are some cases where there is more than just a schedule to add to the source, but I think we can probably put it all in parenthesis.

f"{KNOWN_DRAWBACKS_DESCRIPTION}"
),
"description": {
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to have "estimated" in here somewhere


Reports annual information about flue gas desulfurization systems at generation facilities,
"_core_eia923__yearly_fgd_operation_maintenance": {
"additional_summary_text": "FGD Operation & Maintenance.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should mostly avoid acronyms when possible. I can't even remember what this means.... 😅

"additional_source_text": "(Schedule 8C)",
"usage_warnings": ["irregular_years"],
"additional_details_text": (
"""Reports annual information about flue gas desulfurization (FGD) systems at generation facilities,
Copy link
Member

@aesharpe aesharpe Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This additional_details_text bit feels redundant with the additional_summary_text section.

Comment on lines 143 to 144
Note that a small number of respondents only report annual fuel consumption and net
generation, and all of it is reported in December."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels redundant with the usage warning. If we aren't going to provide any more insight as to why this is happening, I think we can remove this from this section and just keep it in the usage warnings for now. That goes for all the tables where this comes up too.


This table is produced during the transformation of fuel delivery data, in order to
"description": {
"additional_summary_text": "coal mines reporting deliveries in the Fuel Receipts and Costs.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In the fuel receipts and costs" feels weird, did you mean to add the word table after this?

Comment on lines +550 to +552
# + "\n\nThis table exists for naming consistency. While it is technically "
# "aggregated by month, it ends up being identical to the "
# "``out_eia923__generation`` table from which it is derived.",
Copy link
Member

@aesharpe aesharpe Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment this out vs. remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i flagged this in the spreadsheet. i think we should delete these non-aggregated tables. its harder now to append text to portions of the shared dictionary so i commented it out instead of trying to incorporate it when imo we should no be publishing this table. But i will convert this to a TODO instead of just comment it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation for users and contributors. eia923 Anything having to do with EIA Form 923
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

Migrate resource description metadata for eia923 (26 tables, 3011 words)
4 participants