Open
Description
Context
We want to add metadata to URLs, filter for relevancy, and expand our database of valid data sources.
Flowchart
The overall plan for data source identification is now in the readme of this repo.
Properties
These are all explained in the data dictionary
S tier
A tier
-
description
, a subjective thing—fills in the gaps left byname
,record type
, andagency
. Can be used to disambiguate similar sources. Difficult to automate. -
aggregation_type
-
access_type
-
record_download_option_provided
-
record_format
- Is it
agency_supplied
andagency_originated
? If not, who are the supplier and originator? -
coverage_start
-
coverage_end
-
portal_type
-
scraper_url
-
readme_url
Still A tier, but rarely published:
-
retention_schedule
-
update_frequency
-
source_last_updated
B tier
-
size
-
update_method
-
sort_method
-
access_restrictions
Related reading
https://github.com/palewire/storysniffer/
http://blog.apps.npr.org/2016/06/17/scraping-tips.html
Metadata
Metadata
Assignees
Type
Projects
Status
Reference