-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update zipcodes db #14
Conversation
Hi @TJBANEY and thanks for the PR. I'm hesitant to merge this update since users have experienced reliability issues in the past with data from [1], see #3 and #4. I patched this by combining data from [1] and [2], specifically merging the longitude/latitude data from [2] into [1]. The scripts to perform this update are located under I filed #7 as a way to track the work of automating this multi-step data reliability changes, but I haven't gotten time to figure the rest of it out. [1] https://www.unitedstateszipcodes.org |
Commenting here as I'm interested in updating the dataset (as well as for my own use). How did you extract the geocodes for source [2]? @seanpianka |
I invoke this file [0] directly and it loads the data from ci/data and combines them both into the final dataset that is used by the library. [0] defines a dict that holds the "schema" the JSON returned by library (query) calls. The dict's keys are the field name from the transformed dataset, and the value is a dict that contains the "public" field name in this library's API, along with an optional pre-processing function for the transformation. As mentioned above, the data comes from two different datasources. We need to download the recent versions of those, place them with the same final names into |
@seanpianka got it. Was looking for instructions on getting the geocode csv from that second link (https://worldpostalcode.com/united-states/), as it seems to just link to a website with lookup capabilities but no links for a csv download and/or export. |
refactor: move around dataset gen tooling Signed-off-by: Sean Pianka <[email protected]>
I merged the original dataset with more accurate geolocation data from 2019 into a download of the main [1] zipcode dataset from 3 Oct. 2021. It was merged by running the dataset generation script: $ python scripts/build_zipcode_dataset.py
("GPS Keys: ['ZipCode', 'City', 'State', 'Latitude', 'Longitude', "
"'Classification', 'Population']")
("Base Keys: ['zip', 'type', 'decommissioned', 'primary_city', "
"'acceptable_cities', 'unacceptable_cities', 'state', 'county', 'timezone', "
"'area_codes', 'world_region', 'country', 'latitude', 'longitude', "
"'irs_estimated_population']")
Updated GPS from GPS CSV in 0.026109933853149414 seconds.
Writing zipcode information for 42724 places
To zip for production, run:
$ bzip2 zips.json This has been released as a 1.2.0. |
This updates zips.json to one retrieved on August of 2020.