Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform PHMSA company data #4005

Open
wants to merge 70 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
d41717d
Update phmsagas DOI and start transformation
Sep 21, 2024
ba27b7e
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Sep 23, 2024
b40071e
Starting data transformation
Sep 23, 2024
90fd277
Update notebook and change etl_fast phmsagas years
Sep 26, 2024
c5406b1
Add 2023 package data columns for new phmsagas run
Sep 27, 2024
9fc3f45
Added documentation
Oct 1, 2024
c3e2c67
Add troubleshooting to index
Oct 1, 2024
1bfb71d
Update troubleshooting
Oct 2, 2024
7760e9a
Add helpers
Oct 6, 2024
e23bd61
Temp add change
Oct 8, 2024
b5c7acd
Update column mappings
Oct 11, 2024
035712e
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 11, 2024
5d7d00f
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 18, 2024
9518e83
Update notebook and add draft transform script
Oct 24, 2024
418cd55
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 24, 2024
66b63b2
Remove old files and cleanup helpers
Oct 24, 2024
cfc62d4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2024
3225680
Resolved merge conflicts
Nov 1, 2024
f276746
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2024
ac67b39
Resolve merge conflicts
Nov 1, 2024
57209e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2024
6b2747e
Remove list of columns
Nov 1, 2024
d6bb6ea
Remove '.0' logic
Nov 1, 2024
1769487
Updates in response to comments
Nov 2, 2024
405cc1c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2024
78cf151
Clean up documentation and logic
Nov 2, 2024
0c69a84
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2024
044dcd5
Use bulk series str ops
Nov 3, 2024
d8be474
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2024
c5aa116
Reorder transformations
Nov 3, 2024
aa3ac9c
Remove .0 substring from phone numbers
Nov 3, 2024
e3ec14e
Remove temp dev logic
Nov 3, 2024
e84f348
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Nov 3, 2024
7f94d13
Cleanup notebook
Nov 3, 2024
f1ba3dc
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Nov 16, 2024
cb6c767
Make updates per PR feedback
Nov 24, 2024
05cc8ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2024
1d3db43
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
seeess1 Nov 24, 2024
2979526
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
seeess1 Dec 6, 2024
9f57d77
Cleanup method description
seeess1 Dec 6, 2024
94d4d5d
Merge branch 'issue-3770-transform-phmsagas-data' of https://github.c…
seeess1 Dec 6, 2024
6f504e5
Update inits, classes, and fields
seeess1 Dec 7, 2024
759f1e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 7, 2024
f1fa7a3
Deduplication and test updates
seeess1 Dec 12, 2024
2ae5dfb
Merge branch 'issue-3770-transform-phmsagas-data' of https://github.c…
seeess1 Dec 12, 2024
d5f5ffe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2024
62a821c
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
875da71
Extract new PHMSA data, fix state enum, add alembic migration
e-belfer Jan 7, 2025
06ae880
Merge branch 'phmsa-company-transform' of https://github.com/catalyst…
e-belfer Jan 7, 2025
9ac7858
Address ruff failures and unit test failure, move analyzing code to n…
e-belfer Jan 7, 2025
520c287
Get asset checks to run
e-belfer Jan 7, 2025
95e9a5e
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
50ca79a
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
3cfd6fc
Update release notes
e-belfer Jan 7, 2025
6bda226
Merge branch 'main' into phmsa-company-transform
e-belfer Feb 17, 2025
669520e
Fix release notes, rebase migration, update resource metadata
e-belfer Feb 17, 2025
7e66272
Add state encoding to new tables
e-belfer Feb 17, 2025
afee877
Add territories to enums
e-belfer Feb 17, 2025
c7cbbae
Update blocking test on fast ETL
e-belfer Feb 18, 2025
98578fc
Remove state encoding
e-belfer Mar 4, 2025
6d0d510
Merge branch 'main' into phmsa-company-transform
e-belfer Mar 4, 2025
c0567c5
Decapitalize notes, fix report_date
e-belfer Mar 4, 2025
55181bd
Merge branch 'phmsa-company-transform' of https://github.com/catalyst…
e-belfer Mar 4, 2025
5aa6d1a
Merge branch 'main' into phmsa-company-transform
e-belfer Mar 4, 2025
eb8a62b
Merge branch 'main' into phmsa-company-transform
zaneselvans Mar 4, 2025
2bdaa54
Merge branch 'main' into phmsa-company-transform
zaneselvans Mar 4, 2025
e9c27fe
Add dbt models and row counts for phmsa gas distributon operators
zaneselvans Mar 4, 2025
7a2e787
Add tests for phone number standardization function
zaneselvans Mar 4, 2025
eda9524
Add richer field descriptions and some minor transform tweaks.
zaneselvans Mar 7, 2025
5b61a81
Merge branch 'main' into phmsa-company-transform
zaneselvans Mar 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
version: 2
sources:
- name: pudl
tables:
- name: core_phmsagas__yearly_distribution_operators
data_tests:
- check_row_counts_per_partition:
table_name: core_phmsagas__yearly_distribution_operators
partition_column: report_year
Comment on lines +4 to +9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added using the devtoosl/dbt_helper.py script.

columns:
- name: report_date
- name: report_number
- name: report_submission_type
- name: report_year
- name: operator_id_phmsa
- name: operator_name_phmsa
- name: office_address_street
- name: office_address_city
- name: office_address_state
- name: office_address_zip
- name: office_address_county
- name: headquarters_address_street
- name: headquarters_address_city
- name: headquarters_address_state
- name: headquarters_address_zip
- name: headquarters_address_county
- name: excavation_damage_excavation_practices
- name: excavation_damage_locating_practices
- name: excavation_damage_one_call_notification
- name: excavation_damage_other
- name: excavation_damage_total
- name: excavation_tickets
- name: services_efv_in_system
- name: services_efv_installed
- name: services_shutoff_valve_in_system
- name: services_shutoff_valve_installed
- name: federal_land_leaks_repaired_or_scheduled
- name: percent_unaccounted_for_gas
- name: additional_information
- name: preparer_email
- name: preparer_fax
- name: preparer_name
- name: preparer_phone
- name: preparer_title
1 change: 1 addition & 0 deletions dbt/seeds/etl_fast_row_counts.csv
Original file line number Diff line number Diff line change
Expand Up @@ -1035,3 +1035,4 @@ out_sec10k__parents_and_subsidiaries,2021,6687
out_sec10k__parents_and_subsidiaries,2022,5573
out_sec10k__parents_and_subsidiaries,2023,196880
out_vcerare__hourly_available_capacity_factor,2023,27287400
core_phmsagas__yearly_distribution_operators,2022,1447
31 changes: 31 additions & 0 deletions dbt/seeds/etl_full_row_counts.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3693,3 +3693,34 @@ _out_eia__yearly_heat_rate_by_unit,2021,1954
_out_eia__yearly_heat_rate_by_unit,2022,1886
_out_eia__yearly_heat_rate_by_unit,2023,1865
_out_ferc1__yearly_plants_utilities,,7887
core_phmsagas__yearly_distribution_operators,1990,1504
core_phmsagas__yearly_distribution_operators,1991,1569
core_phmsagas__yearly_distribution_operators,1992,1545
core_phmsagas__yearly_distribution_operators,1993,1570
core_phmsagas__yearly_distribution_operators,1998,1464
core_phmsagas__yearly_distribution_operators,1999,1461
core_phmsagas__yearly_distribution_operators,2000,1446
core_phmsagas__yearly_distribution_operators,2001,1440
core_phmsagas__yearly_distribution_operators,2002,1423
core_phmsagas__yearly_distribution_operators,2003,1428
core_phmsagas__yearly_distribution_operators,2004,1523
core_phmsagas__yearly_distribution_operators,2005,1522
core_phmsagas__yearly_distribution_operators,2006,1518
core_phmsagas__yearly_distribution_operators,2007,1502
core_phmsagas__yearly_distribution_operators,2008,1476
core_phmsagas__yearly_distribution_operators,2009,1449
core_phmsagas__yearly_distribution_operators,2010,1437
core_phmsagas__yearly_distribution_operators,2011,1462
core_phmsagas__yearly_distribution_operators,2012,1477
core_phmsagas__yearly_distribution_operators,2013,1492
core_phmsagas__yearly_distribution_operators,2014,1494
core_phmsagas__yearly_distribution_operators,2015,1491
core_phmsagas__yearly_distribution_operators,2016,1487
core_phmsagas__yearly_distribution_operators,2017,1498
core_phmsagas__yearly_distribution_operators,2018,1489
core_phmsagas__yearly_distribution_operators,2019,1478
core_phmsagas__yearly_distribution_operators,2020,1458
core_phmsagas__yearly_distribution_operators,2021,1443
core_phmsagas__yearly_distribution_operators,2022,1447
core_phmsagas__yearly_distribution_operators,2023,1438
core_phmsagas__yearly_distribution_operators,3900,6211
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man if we still have natural gas distribution systems in the year 3900 something is very wrong.

7 changes: 7 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ v2025.XX.x (2025-MM-DD)
New Data
^^^^^^^^

PHMSA
~~~~~
* Add a transformed table containing annual operator data from PHMSA natural gas
distributors. This is a subset of the overall distributor data, focusing on
company-level attributes. Thanks to :user:`seeess1` for all of your work on this! See
:issue:`3770` and :pr:`4005`.

Expanded Data Coverage
^^^^^^^^^^^^^^^^^^^^^^

Expand Down
64 changes: 35 additions & 29 deletions environments/conda-linux-64.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading