Skip to content

Standardize 1000 tons to tons in fgd_sorbent_consumption column #4426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

msa856
Copy link

@msa856 msa856 commented Jul 15, 2025

Overview

Closes #XXXX.

What problem does this address?

What did you change?

Documentation

Make sure to update relevant aspects of the documentation:

  • Update the release notes: reference the PR and related issues.
  • Update relevant Data Source jinja templates (see docs/data_sources/templates).
  • Update relevant table or source description metadata (see src/metadata).
  • Review and update any other aspects of the documentation that might be affected by this PR.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

  • If updating analyses or data processing functions: make sure to update row count expectations in dbt tests.
  • Run make pytest-coverage locally to ensure that the merge queue will accept your PR.
  • Review the PR yourself and call out any questions or issues you have.
  • For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
  • For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.
  • Alternatively, run the build-deploy-pudl GitHub Action manually.

@msa856 msa856 changed the title standardized 1000s to tons in transform func. Standardize 1000 tons to tons in fdg-sorbent-consumption column Jul 15, 2025
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're headed in the right direction! I think the main thing you still need to do is update the alembic schema.

Then you can materialize the asset in dagster and then check if the output data is actually in tons instead of kilotons!

"unit": "1000_tons",
"description": "Quantity of flue gas desulfurization sorbent used, to the nearest 0.1 thousand tons.",
"unit": "tons",
"description": "Quantity of flue gas desulfurization sorbent used, to the nearest ton.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is actually the nearest 100 tons? Looking at the PUDL viewer it does seem like the original values are to the nearest 0.1 thousand (disregarding the floating point error...):

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. In tons it would be 108,599.99 for the first value and we want it rounded to 108600 (re: comment below), correct?

Copy link
Author

@msa856 msa856 Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the alembic schema, ran dagster dev, but materializing _core_eia923__fdg_operation_maintenance failed because of the asset check (continuity check). Should I change that too prior to materialization (src/pudl/transform/eia923.py)?

image

@@ -1350,6 +1350,10 @@ def _core_eia923__fgd_operation_maintenance(
fgd_df.loc[:, fgd_df.columns.str.endswith("_1000_dollars")] *= 1000
fgd_df.columns = fgd_df.columns.str.replace("_1000_dollars", "") # Rename columns

# Convert thousands of tons to tons
fgd_df.loc[:, fgd_df.columns.str.endswith("_1000_tons")] *= 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if this is to the nearest 100 tons we should probably round this appropriately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to load _core_eia923__fgd_operation_maintenance into my notebook and it seemed to be rounded already (I don't know why that differs from what you showed in the viewer). Do you recommend another line of code to ensure it's always rounded?

image

Also, I am not sure if this is the best place to ask but was curious: does etl.defs bypass the asset checks? Is that why I can see the updated column whereas in Dagster I cannot?

@zaneselvans zaneselvans changed the title Standardize 1000 tons to tons in fdg-sorbent-consumption column Standardize 1000 tons to tons in fgd_sorbent_consumption column Jul 16, 2025
@cmgosnell cmgosnell added the community Issues that contributors have volunteered to take on or fostering more community label Jul 21, 2025
@cmgosnell cmgosnell moved this from New to In progress in Catalyst Megaproject Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Issues that contributors have volunteered to take on or fostering more community
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants