Skip to content

CityOfPhiladelphia/address-geocoder

Repository files navigation

address-geocoder

A tool to standardize and geocode Philadelphia addresses

1. Setup

You will need the following things:

  1. An addresses file, provided to you by CityGeo.
  2. An AIS API key, provided to you by CityGeo.
  3. Python installed on your computer, at least version 3.9

To download, use git:

git clone [email protected]:CityOfPhiladelphia/address-geocoder.git

If you have not set up authentication with git on your machine before, reference this guidance on GitHub.

Alternatively, you can download the repository as a zip file using GitHub's web interface.

Next, navigate to the project's directory and create a virtual environment:

python -m venv .venv

Then, activate the virtual environment. This will need to be activated every time you want to run the enrichment tool, not just this once:

source .venv/bin/activate

Finally, install the packages in requirements.text:

pip install -r requirements.txt

Once you have installed everything, it is time to fill in the config file.

2. How to Use Address Geocoder

Address Geocoder takes an input file containing addresses and adds latitude and longitude to those addresses, as well as any optional fields that the user supplies.

In order to run Address Geocoder, first set up the configuration file. By default, Address Geocoder searchers for a file named config.yml. This is the recommended config filename. You can copy the template in config_example.yml to a file named config.yml and continue from there. Detailed steps for filling out the config file are in the next section.

Then, run:

python3 geocoder.py

The dialogue will ask you to specify a config file. Hit enter without typing anything to keep the default config file ('./config.yml')

Configuration

  1. Copy config_example.yml to config.yml by running in the terminal:
cp config_example.yml config.yml
  1. Add your AIS API Key here:
AIS_API_KEY:
  1. Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
  1. Map the address fields to the name of the fields in the csv that you wish to process. If you have one combined address field, map it to full_address_field. Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.

Example, for a csv with the following fields: addr_st, addr_city, addr_zip

input_file: 'example.csv'

full_address_field:

address_fields:
  street: addr_st
  city: addr_city
  state:
  zip: addr_zip
  1. List which fields other than latitude and longitude you want to add. (Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again. A complete list of valid fields can be found further down in this README.
enrichment_fields:
  - census_tract_2020
  - census_block_group_2020
  - census_block_2020

The full config file should look something like this:

# Connection Credentials
AIS_API_KEY: YOUR_API_KEY

# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet

full_address_field: address

# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
  street:
  city:
  state:
  zip:

# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
  - census_tract_2020
  - census_block_group_2020
  - census_block_2020

Note that if one of the input fields has a column the same name as

  1. You're now ready to run the geocoder:
python3 geocoder.py

The dialogue will ask you to specify a config file. Hit enter without typing anything to keep the default config file ('./config.yml')

The output file will be saved in the same location as your input file, with _enriched attached to the filename.

How The Geocoder Works

Address-Geocoder processes a csv file with addresses, and geolocates those addresses using the following steps:

  1. Takes an input file of addresses, and standardizes those addresses using passyunk, Philadelphia's address standardization system.
  2. Compares the standardized data to a local parquet file, addresses.parquet, and adds the user-specified fields as well as latitude and longitude from that file
  3. Not all records will match to the address file. For those records that do not match, Address-Geocoder queries the Address Information System (AIS) API and adds returned fields. Please note that this process can take some time, so processing large files with a messy address field is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take approximately 3-4 minutes.
  4. The enriched file is then saved to the same directory as the input file.

Testing

This package uses the pytest module to conduct unit tests. Tests are located in the tests/ folder.

In order to run all tests, for example:

python3 pytest tests/

To run tests from one file:

python3 pytest tests/test_parser.py

To run one test within a file:

python3 pytest tests/test_parser.py::test_parse_address

Enrichment Fields

Note that if any of the fields in the input file have the same name as an enrichment field, the incoming input file field will be renamed to have the _left suffix.

Field
address_high
address_low_frac
address_low_suffix
address_low
bin
census_block_2010
census_block_2020
census_block_group_2010
census_block_group_2020
census_tract_2010
census_tract_2020
center_city_district
clean_philly_block_captain
commercial_corridor
council_district_2016
council_district_2024
cua_zone
dor_parcel_id
eclipse_location_id
elementary_school
engine_local
high_school
highway_district
highway_section
highway_subsection
historic_district
historic_site
historic_street
ladder_local
lane_closure
leaf_collection_area
li_address_key
li_district
major_phila_watershed
middle_school
neighborhood_advisory_committee
opa_account_num
opa_address
opa_owners
philly_rising_area
planning_district
police_district
police_division
police_service_area
political_division
political_ward
ppr_friends
pwd_account_nums
pwd_center_city_district
pwd_maint_district
pwd_parcel_id
pwd_pressure_district
pwd_treatment_plant
pwd_water_plate
recycling_diversion_rate
rubbish_recycle_day
sanitation_area
sanitation_convenience_center
sanitation_district
seg_id
state_house_rep_2012
state_house_rep_2022
state_senate_2012
state_senate_2022
street_code
street_light_route
street_name
street_postdir
street_predir
street_suffix
traffic_district
traffic_pm_district
unit_num
unit_type
us_congressional_2012
us_congressional_2018
us_congressional_2022
zip_4
zip_code
zoning_document_ids
zoning_rco
zoning

Matching Process

flowchart TB
    A["Input Address"] --> B@{ label: "Is it a Philadelphia address? If unknown, assume it's Philadelphia." }
    B -- Yes --> C["Match to address file"]
    B -- No --> D["Match to TomTom"]
    C -- Match --> E["Return geocoded address with enrichment fields"]
    C -- No Match --> F["Is the address an intersection?"]
    D -- Match --> G["Return geocoded address, but no enrichment fields"]
    D -- No Match --> H["Return non-match"]
    F -- Yes --> I["Get intersection latitude and longitude from AIS"]
    F -- No --> J["Run AIS address match"]
    I --> K["Get address through AIS reverse lookup"]
    J -- Match --> E
    J -- No Match --> D
    K --> J

    A@{ shape: manual-input}
    B@{ shape: decision}
    C@{ shape: process}
    D@{ shape: process}
    E@{ shape: terminal}
    F@{ shape: decision}
    G@{ shape: terminal}
    H@{ shape: terminal}
    I@{ shape: process}
    J@{ shape: process}
    K@{ shape: process}
    style B fill:#BBDEFB
    style C fill:#FFE0B2
    style D fill:#FFE0B2
    style E fill:#C8E6C9
    style F fill:#BBDEFB
    style G fill:#FFF9C4
    style H fill:#FFCDD2
    style I fill:#FFE0B2
    style J fill:#FFE0B2
    style K fill:#FFE0B2
Loading

About

A tool to standardize and geocode Philadelphia addresses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages