-
-
Notifications
You must be signed in to change notification settings - Fork 124
Integrate the Census-based FIPS codes to replace addfips
#4019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: E. Belfer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re:Q1 - are any of the areas both a subdivision and a place/place and consolidated city/subdivision and consolidated city, or can we file them as an "aux code" "aux type" pair? if they're mutually exclusive then we could do either:
- one table, with a variable-length concatenated FIPS (3 chars for country, 5 for state, 8 for county, 13 for aux)
- 3 tables (country could just be hard coded): state, county, aux
Re:Q2 - 👍 "geocodes"
Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: Kathryn Mazaitis <[email protected]>
Co-authored-by: Kathryn Mazaitis <[email protected]>
hm @krivard I've been assuming we want to update the vintage of the fips codes (the the data table i've been testing with ( |
okay since this revelation about different years having different info in there i decided to incorporate 2015 and 2023. I'm purposefully rn not merging the code using |
okay i also added in 2009 bc there were a small handful of counties that were mapped w/ addfips that weren't being mapped without an older year. rn the county codes that don't match are only two and they are bc of name changes: |
oh nice okay i ran the validation tests before actually fixing the Bedford City vs County problem.... and then the row counts changed! because previously Bedford City was getting dropped because |
…w perservign the old bedford city code
# The Virgin Islands and Guam aren't covered by addfips but they have FIPS: | ||
st_croix = (df.state == "VI") & (df.county.isin(["St. Croix", "Saint Croix"])) | ||
df.loc[st_croix, "county_id_fips"] = "78010" | ||
st_john = (df.state == "VI") & (df.county.isin(["St. John", "Saint John"])) | ||
df.loc[st_john, "county_id_fips"] = "78020" | ||
st_thomas = (df.state == "VI") & (df.county.isin(["St. Thomas", "Saint Thomas"])) | ||
df.loc[st_thomas, "county_id_fips"] = "78030" | ||
df.loc[df.state == "GU", "county_id_fips"] = "66010" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't realize this was here until after i added these guys into the CSV. that feels like a better place for them anyway so i delete this guy.
what has changed (i think) since last review:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I read that right, good to go!
out_ferc714__respondents_with_fips,2020,8995 | ||
out_ferc714__respondents_with_fips,2021,8943 | ||
out_ferc714__respondents_with_fips,2022,8953 | ||
out_ferc714__respondents_with_fips,2023,8964 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ ✔️ no 2006es 🏆
df.astype({county_col: pd.StringDtype()}) | ||
.assign(county_tmp=lambda x: _clean_area_name_col(x[county_col], {})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking understanding: this change is for two reasons:
- we needed
county_col
as a string, and sometimes it wasn't _clean_area_name_col
couldn't handle a Series so we have to run it one row at a time
if yes, all good, if no, help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought assign with a lambda function / any callable that takes a series still does its work vectorized. Is that not true? (df.apply()
is row-wise and notoriously slow)
Overview
Closes #3884.
What problem does this address?
What did you change?
Questions
_core
orcore
table?? ifcore
we may want to break it out into multiple tables bc really its many geographies all squished into one tableDocumentation
Make sure to update relevant aspects of the documentation.
Testing
How did you make sure this worked? How can a reviewer verify this?