Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update zipcodes db #14

Closed
wants to merge 2 commits into from
Closed

Update zipcodes db #14

wants to merge 2 commits into from

Conversation

TJBANEY
Copy link

@TJBANEY TJBANEY commented Aug 25, 2020

This updates zips.json to one retrieved on August of 2020.

@seanpianka
Copy link
Owner

seanpianka commented Sep 13, 2020

Hi @TJBANEY and thanks for the PR. I'm hesitant to merge this update since users have experienced reliability issues in the past with data from [1], see #3 and #4. I patched this by combining data from [1] and [2], specifically merging the longitude/latitude data from [2] into [1].

The scripts to perform this update are located under /ci/data, and if your PR could include these modifications, I could approve and merge this.

I filed #7 as a way to track the work of automating this multi-step data reliability changes, but I haven't gotten time to figure the rest of it out.

[1] https://www.unitedstateszipcodes.org
[2] https://worldpostalcode.com/united-states/

@wang-yinan
Copy link

Commenting here as I'm interested in updating the dataset (as well as for my own use). How did you extract the geocodes for source [2]? @seanpianka

@seanpianka
Copy link
Owner

I invoke this file [0] directly and it loads the data from ci/data and combines them both into the final dataset that is used by the library.

[0] defines a dict that holds the "schema" the JSON returned by library (query) calls. The dict's keys are the field name from the transformed dataset, and the value is a dict that contains the "public" field name in this library's API, along with an optional pre-processing function for the transformation.

As mentioned above, the data comes from two different datasources. We need to download the recent versions of those, place them with the same final names into ci/data, and run the script. Once that's done, I can release a new version.

@wang-yinan
Copy link

@seanpianka got it. Was looking for instructions on getting the geocode csv from that second link (https://worldpostalcode.com/united-states/), as it seems to just link to a website with lookup capabilities but no links for a csv download and/or export.

seanpianka referenced this pull request Oct 3, 2021
refactor: move around dataset gen tooling

Signed-off-by: Sean Pianka <[email protected]>
@seanpianka
Copy link
Owner

I merged the original dataset with more accurate geolocation data from 2019 into a download of the main [1] zipcode dataset from 3 Oct. 2021. It was merged by running the dataset generation script:

$ python scripts/build_zipcode_dataset.py
("GPS Keys: ['ZipCode', 'City', 'State', 'Latitude', 'Longitude', "
 "'Classification', 'Population']")
("Base Keys: ['zip', 'type', 'decommissioned', 'primary_city', "
 "'acceptable_cities', 'unacceptable_cities', 'state', 'county', 'timezone', "
 "'area_codes', 'world_region', 'country', 'latitude', 'longitude', "
 "'irs_estimated_population']")
Updated GPS from GPS CSV in 0.026109933853149414 seconds.
Writing zipcode information for 42724 places
To zip for production, run:
$ bzip2 zips.json

This has been released as a 1.2.0.

@seanpianka seanpianka closed this Oct 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants