-
-
Notifications
You must be signed in to change notification settings - Fork 128
Metadata migration for EIA923 #4422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Lightly-revise table name data stubs for use in templating. Includes minimal seed data for testing. Includes new 'label' attribute for DataSource metadata.
…on metadata in type checks.
(we're not adding any new libraries so this should be fine?)
Documentation | ||
^^^^^^^^^^^^^ | ||
* Migrated table description metadata into new format for EIA 923. See :issue:`4400` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably add a note in here before the dataset specific migrations about the whole migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Also might be a good idea to highlight the table name changes specifically.
Hmmm sphinx can't find your references for |
I've learned that in order to |
|
||
`make docs-build` will build and the delete all of the rst files via | ||
``cleanup_rsts`` and ``cleanup_csv_dir`` in ``docs/conf.py``. If you want to | ||
preserve them for a one-off build, you can comment out that step in | ||
``docs/conf.py``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added this guy to hopefully help debug warnings like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to point folks at the autoapi_keep_files
variable in conf.py
for debugging problems in any of the docstrings, since that's separate from our custom cleanup functions.
I think the note above "If you create a new module, the corresponding documentation file will also need to be checked in to version control." is incorrect, and dates from a time when we had all these generated files checked into git.
:ref:`i_core_eia923__fgd_operation_maintenance` and | ||
:ref:`i_core_eia923__yearly_fgd_operation_maintenance` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how i learned it needs to start with an i_core..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never heard of this? But I trust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done! Just publishing this bit because I need to switch computers.
Documentation | ||
^^^^^^^^^^^^^ | ||
* Migrated table description metadata into new format for EIA 923. See :issue:`4400` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Also might be a good idea to highlight the table name changes specifically.
:ref:`i_core_eia923__fgd_operation_maintenance` and | ||
:ref:`i_core_eia923__yearly_fgd_operation_maintenance` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never heard of this? But I trust?
f"{KNOWN_DRAWBACKS_DESCRIPTION}" | ||
), | ||
"description": { | ||
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these to be all uppercased bc it's a description, not a title (same goes for all the rest below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not sure what "downscaled" means in this context. It might make more sense to just use the first sentence of the old description: "estimated net generation and fuel consumption associated with each combination of generator, energy source, and prime mover."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to have "estimated" in here somewhere
), | ||
"description": { | ||
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.", | ||
"additional_source_text": "(Schedule 3)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that this needs to be in parenthesis here. We should decide on a standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the parenthesis, this displays as
EIA Form 923 -- Power Plant Operations Report (Schedule 3)
Which I do like best out of the alternatives,
EIA Form 923 -- Power Plant Operations Report Schedule 3
EIA Form 923 -- Power Plant Operations Report , Schedule 3
(with the space between Report and the comma, unfortunately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. There are some cases where there is more than just a schedule to add to the source, but I think we can probably put it all in parenthesis.
"reported in the EIA-923 generation fuel are allocated to individual " | ||
"generators. Then, these allocations are aggregated to unique generator, " | ||
f"prime mover, and energy source code combinations. " | ||
f"{KNOWN_DRAWBACKS_DESCRIPTION}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of these known drawbacks be usage warnings?
"additional_details_text": ( | ||
f"{freq.title()} estimated net generation and fuel consumption by generator. " | ||
"Based on allocating net electricity generation and fuel consumption reported " | ||
"in the EIA-923 generation and generation_fuel tables to individual generators." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have some way of indicating whether a table has been altered significantly by us. For instance, the estimated net generation is calculated by us, not EIA. We could mention this in additional_source_text
or in additional_details_text
maybe? Or maybe elaborate on the estimated_values
usage warning detail text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay Part 2: I didn't go into too much detail for the sake of this being a migration vs. editing pass, but one high level thought I have is this:
In my opinion, the additionally_summary_text
should read more like a topic sentence and less like a title. Some of the ones you have in here feel like they need just a little bit more information in them--I think one thing you could do is take the first sentence of the additional_details_text
and use that instead, or something similar. What do you think?
), | ||
"description": { | ||
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.", | ||
"additional_source_text": "(Schedule 3)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. There are some cases where there is more than just a schedule to add to the source, but I think we can probably put it all in parenthesis.
f"{KNOWN_DRAWBACKS_DESCRIPTION}" | ||
), | ||
"description": { | ||
"additional_summary_text": "of Downscaled Net Generation and Fuel Consumption.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to have "estimated" in here somewhere
|
||
Reports annual information about flue gas desulfurization systems at generation facilities, | ||
"_core_eia923__yearly_fgd_operation_maintenance": { | ||
"additional_summary_text": "FGD Operation & Maintenance.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should mostly avoid acronyms when possible. I can't even remember what this means.... 😅
"additional_source_text": "(Schedule 8C)", | ||
"usage_warnings": ["irregular_years"], | ||
"additional_details_text": ( | ||
"""Reports annual information about flue gas desulfurization (FGD) systems at generation facilities, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This additional_details_text
bit feels redundant with the additional_summary_text
section.
Note that a small number of respondents only report annual fuel consumption and net | ||
generation, and all of it is reported in December.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels redundant with the usage warning. If we aren't going to provide any more insight as to why this is happening, I think we can remove this from this section and just keep it in the usage warnings for now. That goes for all the tables where this comes up too.
|
||
This table is produced during the transformation of fuel delivery data, in order to | ||
"description": { | ||
"additional_summary_text": "coal mines reporting deliveries in the Fuel Receipts and Costs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In the fuel receipts and costs" feels weird, did you mean to add the word table after this?
# + "\n\nThis table exists for naming consistency. While it is technically " | ||
# "aggregated by month, it ends up being identical to the " | ||
# "``out_eia923__generation`` table from which it is derived.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you comment this out vs. remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i flagged this in the spreadsheet. i think we should delete these non-aggregated tables. its harder now to append text to portions of the shared dictionary so i commented it out instead of trying to incorporate it when imo we should no be publishing this table. But i will convert this to a TODO instead of just comment it out.
Overview
Closes #4400
What problem does this address?
What did you change?
Documentation
Make sure to update relevant aspects of the documentation:
docs/data_sources/templates
).src/metadata
).Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
dbt
tests.make pytest-coverage
locally to ensure that the merge queue will accept your PR.make pytest-coverage
passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests usingpytest
and--live-dbs
.make pytest-validate
.build-deploy-pudl
GitHub Action manually.