-
-
Notifications
You must be signed in to change notification settings - Fork 128
Standardize 1000 tons to tons in fgd_sorbent_consumption column #4426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're headed in the right direction! I think the main thing you still need to do is update the alembic schema.
Then you can materialize the asset in dagster and then check if the output data is actually in tons instead of kilotons!
"unit": "1000_tons", | ||
"description": "Quantity of flue gas desulfurization sorbent used, to the nearest 0.1 thousand tons.", | ||
"unit": "tons", | ||
"description": "Quantity of flue gas desulfurization sorbent used, to the nearest ton.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is actually the nearest 100 tons? Looking at the PUDL viewer it does seem like the original values are to the nearest 0.1 thousand (disregarding the floating point error...):

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. In tons it would be 108,599.99 for the first value and we want it rounded to 108600 (re: comment below), correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1350,6 +1350,10 @@ def _core_eia923__fgd_operation_maintenance( | |||
fgd_df.loc[:, fgd_df.columns.str.endswith("_1000_dollars")] *= 1000 | |||
fgd_df.columns = fgd_df.columns.str.replace("_1000_dollars", "") # Rename columns | |||
|
|||
# Convert thousands of tons to tons | |||
fgd_df.loc[:, fgd_df.columns.str.endswith("_1000_tons")] *= 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if this is to the nearest 100 tons we should probably round this appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to load _core_eia923__fgd_operation_maintenance
into my notebook and it seemed to be rounded already (I don't know why that differs from what you showed in the viewer). Do you recommend another line of code to ensure it's always rounded?

Also, I am not sure if this is the best place to ask but was curious: does etl.defs
bypass the asset checks? Is that why I can see the updated column whereas in Dagster I cannot?
Overview
Closes #XXXX.
What problem does this address?
What did you change?
Documentation
Make sure to update relevant aspects of the documentation:
docs/data_sources/templates
).src/metadata
).Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
dbt
tests.make pytest-coverage
locally to ensure that the merge queue will accept your PR.make pytest-coverage
passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests usingpytest
and--live-dbs
.make pytest-validate
.build-deploy-pudl
GitHub Action manually.