Skip to content

Standardize remaining handful of "thousands of" units in PUDL #4301

@e-belfer

Description

@e-belfer

Overview

What is the problem we're solving? For very simple items, this can be encapsulated in the success criteria.

Elsewhere we convert our units from thousand of dollars to dollars, and thousands of lbs to lbs. We haven't done consistently with all fields however. Notably, these fields are still in "thousands of" units:

  • fgd_sorbent_consumption_1000_tons (to be transformed in the _core_eia923__fgd_operation_maintenance function in pudl.transform.eia923.py)
  • max_steam_flow_1000_lbs_per_hour (to be transformed in the _core_eia860__boilers function in pudl.transform.eia860.py)
  • steam_load_1000_lbs (should be converted in the pudl.extract.epacems.py module)

Success Criteria

How will we know that we're done?

  • fgd_sorbent_consumption_1000_tons -> fgd_sorbent_consumption_tons
  • max_steam_flow_1000_lbs_per_hour -> max_steam_flow_lbs_per_hour
  • steam_load_1000_lbs -> steam_load_lbs

Get set up

  • Fork the PUDL repository and follow the steps to set up the PUDL development environment
  • Activate the pudl-dev environment: mamba activate pudl-dev
  • Grab the latest raw data for any relevant tables - in this case, EIA 860, EIA 923 and EPACEMS. To do so, run pudl_datastore --dataset eia923, pudl_datastore --dataset eia860, and pudl_datastore --dataset epacems (the last one pulls a large chunk of data, and might take a while).
  • For the first two variables: First, generate the raw assets upstream of the transform: follow the instructions to open Dagster, and in the left-hand menu find the "raw_eia923" asset group, e.g.. Click "Materialize all".

Next steps

  • For each field, identify the first _core or core table in which it is transformed (see pudl.metadata.resources`).
  • In the corresponding transform function, divide this field by one thousand.
  • Update the field description, units and name
  • Update the alembic schema
  • For the last variable: CEMS processing happens all in one step. To test your implemented solution, you'll simply need to find the "core_epacems__hourly_emissions" asset in the "Asset" tab, and materialize it. For the other two variables, you can generate the affected assets (e.g., _core_eia923__FGD_operation_maintenance) and inspect them in devtools/inspect_assets.ipynb.

Metadata

Metadata

Assignees

Labels

communityIssues that contributors have volunteered to take on or fostering more communitydata-typesDtype conversions, standardization and implications of data typesgood first issueGood issues for first-time contributors. Self-contained, low context, no credentials required.

Type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions