There are a number of public databases that store missing persons data, but there exists two major problems:
- The databases do not provide public API access and
- There is no common data standard for a missing persons case between databases.
I want to collect all of the missing persons data into one common database to provide easy access to researchers and students.
First I chose to focus on merging National Missing and Unidentified Persons System (NamUs) and National Center for Missing & Exploited Children (NCMEC). I crawled through the NCMEC database utilizing Requests and its underlying JSON API, while I used Selenium to crawl NamUs. Data from both sources were cleaned, standardized and linked if duplicated.
Crawling can be time consuming and hard to understand. I have created some iPython notebooks tutorials that walk through some of the basic building blocks of the code repository:
- Extracting Data from NCMEC
- Introduction to Selenium
- Extracting Data from NamUs
- Working with Find Us Missing Persons API
After all of the data was collected and merged into a single JSON file, I wanted to host the information. I used PostgreSQL to build a database, Flask to handle the API requests and Heroku to host everything. For proof of concept, I decided to only host a snapshot of the missing persons data for California collected April 20, 2014.
###find-us.herokuapp.com
dumps all missing persons cases
###find-us.herokuapp.com/{country_abbrev}
dumps all missing persons cases in country
ex: find-us.herokuapp.com/us
###find-us.herokuapp.com/{country_abbrev}/{state}
dumps all missing persons cases in country and state
ex: find-us.herokuapp.com/us/California
###find-us.herokuapp.com/{country_abbrev}/{state}/{county}
dumps all missing persons cases in country, state and county
ex: find-us.herokuapp.com/us/California/Los Angeles
###find-us.herokuapp.com/{country_abbrev}/{state}/{county}/{city}
dumps all missing persons cases in country, state, county and city
ex: find-us.herokuapp.com/us/California/Los Angeles/Los Angeles
###find-us.herokuapp.com/search?{criterion=val}
dumps all missing persons cases matching that criterion
- age
- city
- country
- county
- date
- eye_color
- first_name
- hair_color
- height
- last_name
- race
- sex
- state
- weight
ex: find-us.herokuapp.com/search?sex=Male&eye_color=Blue
ex: find-us.herokuapp.com/search?hair_color=Blonde&county=Orange
###find-us.herokuapp.com/search?{criterion}_start={val}&{criterion}_end={val2}
dumps all missing persons cases between {criterion}_start and {criterion}_end
- age
- date
- height
- weight
ex: find-us.herokuapp.com/search?age_start=10&age_end=15
ex: find-us.herokuapp.com/search?date_start=2000-01-20&date_end=2000-05-20
ex: find-us.herokuapp.com/search?date_start=1976-05-20 00:00:00&date_end=1976-05-20 15:30:00
###find-us.herokuapp.com/identifiers/race
dumps standard categories for race
###find-us.herokuapp.com/identifiers/eye_color
dumps standard categories for eye_color
###find-us.herokuapp.com/identifiers/hair_color
dumps standard categories for hair_color
- ncmec_number: NCMEC case number
- namus_number: NamUs case number
- org_name: long name of organization storing the information
- org: organization abbreviation such as NCMEC and NAMUS
- org_contact: contact information for the organization
- agency_name: investigating agency usually local police department
- agency_contact: contact information for the agency
- date (YYY-MM-DD HH:MM:SS): the date and miltary time when this person went missing
- city: city where the person went missing
- state: state where the person went missing
- county: county where the person went missing
- country: country where the missing person
- circumstance: description of the circumstances surrounding this disappearance
- first_name: first name of the missing person
- middle_name: middle name of the missing person
- last_name: last name of the missing person
- age (years): the current age of the missing person
- sex: {Female, Male}
- race: {"White", "Black/African American", "Asian or Pacific Islander", "Native American", "Non-White Hispanic/Latino", "White Hispanic/Latino", "Other", "Unknown"}
- eye_color: {"Blue", "Brown", "Hazel", "Gray", "Green", "Pink", "Maroon", "Black", "Multicolor", "Unknown"}
- hair_color: {"Brown", "Sandy", "Black", "Gray", "White", "Blonde", "Red/Auburn", "Unknown"}
- weight (pounds): the maximum weight of the missing person
- height (inches): the maximum height of the missing person
- photo: the url to a photo of the missing person
- aged_photo: the url to an age-enhanced photo of the missing person