Skip to content

Data Source Identification 2025 plan #108

Open
@josh-chamberlain

Description

@josh-chamberlain

Context

We have some pieces coming together, but it's time to focus efforts around a few central goals. Our timeline is through June 2025.

Overall Goals

  1. Grow the database with quality sources according to our priorities
    • common topics: analyzing response effectiveness (calls for service, dispatch); documenting interactions & outcomes (stops, arrests, use of force); understanding basics of systems (agency completion & metadata, personnel)
    • geographic: go deep on one county at a time, starting with the most populous
    • followed areas: add sources to searches followed by our users
  2. Improve labeling models
    • eventually, we should be able to get a fairly accurate relevancy, record type, and agency from trained language models. This is easier the more quality sources there are in the database.
  3. Quickly investigate requests

System overview

Check out the diagram in the README!

Q3–4

Close the loop

Show off

New Source Collectors

Encourage participation, add Big Value

Maintain the dataset

  • when sources break (a change in URL status) we should try to find a replacement source
  • agency crawler automation: check known websites for sources were moved, updated, or added
  • Log usefulness of sources
    • how often are people clicking?
    • capture sentiment (did you find what you needed with this search? is this source useful?)\

Improve the models

Q1–2 (we got a lot done!)

Build the engine

Refinements (feature creep!)

Use metrics to measure success

Run the engine

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Reference

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions