Populate Location fields: vaccines_offered, accepts_appointments and accepts_walkins #650

simonw · 2021-06-09T22:58:59Z

Replaces #504 - needed by #649.

From https://docs.google.com/document/d/17svyCVXcloArj1wbUgu7QwEb6xeIDa-p7C3U0_yLgKg/edit

BASIC RULES

For durable fields (more-or-less: info about the location itself), we should not update existing information with each run of the ingestor / each new data source

Open question: If a new source has data in a currently-empty field, should we import it?

For frequently-changing fields (more-or-less: info about vaccine distribution at the site), we should update information with each report, each run of the ingestor, and each new data source

FIELDS THAT INGESTORS SHOULD BE ABLE TO OVERWRITE

Which vaccines are available

The "Appointments available" tag

The "Walk-ins accepted" / "Appointment required" / "Appointments or walk-ins accepted" tags

Plain-text "notes about this location" (if the feed has them; this is equivalent to the "Public notes" fields on locations)

FIELDS THAT INGESTORS SHOULD NEVER OVERWRITE

Address, phone #, or hours

Could be: okay to overwrite unless they've been manually confirmed or edited in some way?

Website

Phone #

Hours

SPECIFIC ONE-OFF RULES:

We trust VF more than anything else for which vaccines are available. If the location exists in VF, we should show VF's versions of which vaccines are available.

We trust Vaccine Spotter more than anything else for whether appointments are available

If we have public notes from a manual report that are less than a week old, we trust those more than any ingested notes field

simonw · 2021-06-09T23:07:17Z

The fields in question, added in #494:

vial/vaccinate/core/models.py

Lines 241 to 252 in 14c9098

    
           vaccines_offered = models.JSONField( 
        
               null=True, 
        
               blank=True, 
        
               help_text="JSON array of strings representing vaccines on offer here - enter 'null' if we do not know", 
        
           ) 
        
           accepts_appointments = models.BooleanField( 
        
               null=True, blank=True, help_text="Does this location accept appointments" 
        
           ) 
        
           accepts_walkins = models.BooleanField( 
        
               null=True, blank=True, help_text="Does this location accept walkins" 
        
           ) 
        
           public_notes = models.TextField(blank=True, null=True)

These were editable in the VIAL interface for a while - they are no longer editable (or at least they are hidden by default). The number of locations with these fields populated is:

https://vial.calltheshots.us/dashboard/?sql=select+%27vaccines_offered%27+as+label%2C+count%28%2A%29+from+location+where+vaccines_offered+is+not+null%0D%0Aunion+all%0D%0Aselect+%27accepts_appointments%27+as+label%2C+count%28%2A%29+from+location+where+accepts_appointments+is+not+null%0D%0Aunion+all%0D%0Aselect+%27accepts_walkins%27+as+label%2C+count%28%2A%29+from+location+where+accepts_walkins+is+not+null%0D%0Aunion+all%0D%0Aselect+%27public_notes%27+as+label%2C+count%28%2A%29+from+location+where+public_notes+is+not+null+and+public_notes+%21%3D+%27%27%3A8j--LJ1Eyo7eKpaJLN7jOVQGdUI_Y3MBPeygfoytBJ4&sql=select%3AfGunWs486BWXY9jrd1QIMrBUTKRwu9OMHinDrR7m2Rg

label	count
vaccines_offered	109
accepts_walkins	133
accepts_appointments	158
public_notes	45

This is from the short period of time when these were editable. I'm going to export that data and otherwise pretend it didn't exist.

simonw · 2021-06-09T23:11:10Z

select
  id, public_id, name, full_address,
  vaccines_offered, accepts_walkins, accepts_appointments, public_notes
from
  location
where (
  vaccines_offered is not null
    or
  accepts_appointments is not null
    or 
  accepts_walkins is not null
    or
  (public_notes is not null and public_notes != '')
)

Exported data is here: https://gist.github.com/simonw/d7644d1f444bc4221b3b284f73468360

simonw · 2021-06-09T23:12:43Z

There are two steps here: backfill the existing data, and ensure that when new reports or source locations are ingested the data is updated to reflect our best available versions.

simonw · 2021-06-09T23:18:59Z

This query is really useful:

with source_location_info as (
  select
    id,
    source_uid,
    json_extract_path(import_json::json, 'availability') as availability,
    json_extract_path(import_json::json, 'inventory') as inventory
  from
    source_location
)
select
  *
from
  source_location_info
where
  availability is not null
  or inventory is not null
limit
  100

https://vial.calltheshots.us/dashboard/?sql=with+source_location_info+as+%28%0D%0A++select%0D%0A++++id%2C%0D%0A++++source_uid%2C%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%29+as+availability%2C%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+as+inventory%0D%0A++from%0D%0A++++source_location%0D%0A%29%0D%0Aselect%0D%0A++%2A%0D%0Afrom%0D%0A++source_location_info%0D%0Awhere%0D%0A++availability+is+not+null%0D%0A++or+inventory+is+not+null%0D%0Alimit%0D%0A++100%3A5U1FLE_5WuQpPx1NyPlqNbVoZDkz8vP3bVBnzNt_Ze0

simonw · 2021-06-09T23:21:09Z

with source_location_info as (
  select
    id,
    source_uid,
    json_extract_path(import_json::json, 'availability') as availability,
    json_extract_path(import_json::json, 'inventory') as inventory
  from
    source_location
)
select
  count(*)
from
  source_location_info
where
  availability is not null
  or inventory is not null

Too long to run through the dashboard, so I used an unlimited local connection - returned 155,304 - select count(*) from source_location returns 238,815

simonw · 2021-06-09T23:22:57Z

with source_location_info as (
  select
    id,
    source_uid,
    matched_location_id,
    json_extract_path(import_json::json, 'availability') as availability,
    json_extract_path(import_json::json, 'inventory') as inventory
  from
    source_location
)
select
  count(distinct matched_location_id)
from
  source_location_info
where
  availability is not null
  or inventory is not null

Takes 17.8s and returns 63,730 - and since select count(*) from location where soft_deleted = false returns 75,478 the VAST majority of our locations have useful availability or inventory information in their matched source locations.

simonw · 2021-06-09T23:23:53Z

I changed that where to:

where
  availability is not null
  and inventory is not null

And it returned 50,228 locations that have a matched source location with BOTH of those fields.

simonw · 2021-06-09T23:29:36Z

As for reports...

with skip_reports as (
  select report_id from call_report_availability_tag where availabilitytag_id = (
    select id from availability_tag where "group" = 'skip'
  )
)
select count(distinct location_id) from report where id not in (select report_id from skip_reports) and vaccines_offered is not null

https://vial.calltheshots.us/dashboard/?sql=with+skip_reports+as+%28%0D%0A++select+report_id+from+call_report_availability_tag+where+availabilitytag_id+%3D+%28select+id+from+availability_tag+where+%22group%22+%3D+%27skip%27%29%0D%0A%29%0D%0Aselect+count%28distinct+location_id%29+from+report+where+id+not+in+%28select+report_id+from+skip_reports%29+and+vaccines_offered+is+not+null%3AKJargSIxiGOHI9fxg5CQLIfe03j2yIH-2ZP2LwRLlHY

Returns 15,760 - there are 15,760 locations for which we have at least one non-skip report which has populated vaccines_offered data.

simonw · 2021-06-09T23:31:55Z

One way to look at this is that we have a sequence of opinions about which vaccines are offered - from imported source locations and from reports.

Slight hitch is that since we over-write source locations when we import them we don't have the full history of those opinions stored in our PostgreSQL database - though we likely have them in a git history somewhere.

simonw · 2021-06-09T23:39:26Z

Also interesting: which of our locations have the most matched source locations?

https://vial.calltheshots.us/dashboard/?sql=select%20%22public_id%22%2C%20count%28%2A%29%20as%20n%20from%20%28select%20source_uid%2C%20location.public_id%20from%20source_location%20join%20location%20on%20source_location.matched_location_id%20%3D%20location.id%29%20as%20results%20group%20by%20%22public_id%22%20order%20by%20n%20desc%3AJZLqOzIQWCMC6HTTdvtjaR0YPMDy89R9V_9x3ngSxSQ

lpptz	63
lxfpc	50
lkqqg	43
ltgbp	33
lxfqk	31
ltfrw	26
lytcz	26
ltchz	25
ltbwh	24
ldmgq	24
lzhfr	23
ldmpt	23
ldmbg	22
lxdxp	22
lxtrg	21
ldkyw	21
lrcqm	21
ldktw	21
lxdxm	21
ldqyp	20
lrbxc	20
ltghm	20
ldwhh	20
lhydw	20
ldwzy	20
ldmfh	20
lcdfch	20
lcdffy	20
ldkyz	20
ldqxr	20
ldkwx	20
ldkyr	20
ldkzr	20
ldmzt	20
ldkwy	20
ldkwh	20
ldkyh	20
ldxqy	20
lyhch	20
ldkzz	20
ldkym	19

https://vial-staging.calltheshots.us/location/lpptz is the top one - that's Walgreens Co. #19134 in CT - because of the CT scrapers: https://vial.calltheshots.us/dashboard/?sql=select+source_uid%2C+source_name%2C+name+from+source_location+where+matched_location_id+%3D+35007%3AncVcHDs5Axbk6K_g7YTGp8BW-m7RT3HWkoABzBgzpgE

Actually that looks bad - I think a bunch of different CT Walgreens may have been incorrectly matched:

source_uid	source_name	name
ct_covidvaccinefinder_gov:605a3748494866ed91e08065	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Glastonbury)
ct_covidvaccinefinder_gov:605a374a494866ed91e0807c	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Hartford, Albany Ave)
ct_covidvaccinefinder_gov:605a374a494866ed91e0807f	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Bristol, Main St)
ct_covidvaccinefinder_gov:605a374c494866ed91e0808a	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Terryville)
ct_covidvaccinefinder_gov:605a374d494866ed91e08095	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Thomaston)
ct_covidvaccinefinder_gov:605a3750494866ed91e080ba	ct_covidvaccinefinder_gov	Walgreens Pharmacy (Litchfield)

simonw · 2021-06-09T23:41:08Z

I'm going to exclude ct_covidvaccinefinder_gov from this project for the moment.

simonw · 2021-06-16T00:11:29Z

Sample of the values of availability and inventory taken by sampling the most recent 10,000 source locations that have them:

Availability: https://vial.calltheshots.us/dashboard/?sql=select+availability%3A%3Atext%2C+count%28%2A%29+as+n+from+%28with+source_location_info+as+%28%0D%0A++select%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%29+as+availability%2C%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+as+inventory%2C%0D%0A++++id%2C+source_uid%2C+source_name%2C+name%2C+created_at%2C+matched_location_id%2C+last_imported_at%0D%0A++from%0D%0A++++source_location%0D%0A%29%0D%0Aselect%0D%0A++%2A%0D%0Afrom%0D%0A++source_location_info%0D%0Awhere%0D%0A++availability+is+not+null%0D%0A++or+inventory+is+not+null%0D%0Aorder+by+id+desc+limit+2000%0D%0A%29+as+results+group+by+availability%3A%3Atext+order+by+n+desc%3AeRwjzoJfg5JqhIgsoZyxzQ_vlFClqtZ4i0g84tza1_s

Inventory: https://vial.calltheshots.us/dashboard/?sql=select+inventory%3A%3Atext%2C+count(*)+as+n+from+(with+source_location_info+as+(%0D%0A++select%0D%0A++++json_extract_path(import_json%3A%3Ajson%2C+%27availability%27)+as+availability%2C%0D%0A++++json_extract_path(import_json%3A%3Ajson%2C+%27inventory%27)+as+inventory%2C%0D%0A++++id%2C+source_uid%2C+source_name%2C+name%2C+created_at%2C+matched_location_id%2C+last_imported_at%0D%0A++from%0D%0A++++source_location%0D%0A)%0D%0Aselect%0D%0A++*%0D%0Afrom%0D%0A++source_location_info%0D%0Awhere%0D%0A++availability+is+not+null%0D%0A++or+inventory+is+not+null%0D%0Aorder+by+id+desc+limit+10000%0D%0A)+as+results+group+by+inventory%3A%3Atext+order+by+n+desc%3Aiy4Iupdi-OBx0VTi0hN349c7JoqWNwjaR3OBi0LSEbo

So those don't always have supply levels.

simonw · 2021-06-16T00:14:12Z

Here's the scraper code that sets drop_in: https://vaccinateca-ripgrep.datasette.app/-/ripgrep?pattern=drop_in&glob=

simonw · 2021-06-16T00:16:12Z

Found some examples that have both drop_in and appointments keys: https://vial.calltheshots.us/dashboard/?sql=with+source_location_info+as+%28%0D%0A++select%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%2C+%27drop_in%27%29+as+drop_in%2C%0D%0A++++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%2C+%27appointments%27%29+as+appointments%2C%0D%0Aimport_json%2C%0D%0A++++id%2C+source_uid%2C+source_name%2C+name%2C+created_at%2C+matched_location_id%2C+last_imported_at%0D%0A++from%0D%0A++++source_location%0D%0A%29%0D%0Aselect%0D%0A++%2A%0D%0Afrom%0D%0A++source_location_info%0D%0Awhere%0D%0A++drop_in+is+not+null+and+appointments+is+not+null%0D%0Aorder+by+id+desc%0D%0Alimit%0D%0A++100%3AZ9Kb0ZodH_4oFxEVDkV2LN3YfWvdOQU55vTP7rmSn3w

simonw · 2021-06-16T00:19:35Z

I ran this (took 20s so not through Django SQL Dashboard) to see which sources have both drop_in and appointment records:

select "source_name", count(*) as n from (with source_location_info as (
  select
    json_extract_path(import_json::json, 'availability', 'drop_in') as drop_in,
    json_extract_path(import_json::json, 'availability', 'appointments') as appointments,
import_json,
    id, source_uid, source_name, name, created_at, matched_location_id, last_imported_at
  from
    source_location
)
select
  *
from
  source_location_info
where
  drop_in is not null and appointments is not null) as results group by "source_name" order by n desc

vaccinate_nj	1088
nyc_arcgis	590
sf_gov	74
mn_gov	22
dc_district	9
ct_state	5

simonw · 2021-06-16T00:22:12Z

Looks like vaccinefinder_org scrapes only ever have the drop_in true or false key, never the appointments key.

select json_extract_path(import_json::json, 'availability')::text as availability, count(*)
from source_location where source_name = 'vaccinefinder_org'
group by json_extract_path(import_json::json, 'availability')::text

{"drop_in": false}	41252
{"drop_in": true}	8259
	4584

simonw · 2021-06-16T20:20:12Z

I'm adding the following Location columns, in order to better understand the source of data that we show to our users:

+    vaccines_offered_provenance_report = models.ForeignKey(
+        "Report",
+        null=True,
+        blank=True,
+        related_name="+",
+        help_text="The report that last populated vaccines_offered",
+        on_delete=models.PROTECT,
+    )
+    vaccines_offered_provenance_source_location = models.ForeignKey(
+        "SourceLocation",
+        null=True,
+        blank=True,
+        related_name="+",
+        help_text="The source location that last populated vaccines_offered",
+        on_delete=models.PROTECT,
+    )
+    vaccines_offered_last_updated_at = models.DateTimeField(
+        help_text="When vaccines_offered was last updated",
+        blank=True,
+        null=True,
+    )
+
+    appointments_walkins_provenance_report = models.ForeignKey(
+        "Report",
+        null=True,
+        blank=True,
+        related_name="+",
+        help_text="The report that last populated accepts_walkins and accepts_appointments",
+        on_delete=models.PROTECT,
+    )
+    appointments_walkins_provenance_source_location = models.ForeignKey(
+        "SourceLocation",
+        null=True,
+        blank=True,
+        related_name="+",
+        help_text="The source location that last populated accepts_walkins and accepts_appointments",
+        on_delete=models.PROTECT,
+    )
+    appointments_walkins_last_updated_at = models.DateTimeField(
+        help_text="When accepts_walkins and accepts_appointments were last updated",
+        blank=True,
+        null=True,
+    )

…ge, refs #650

simonw · 2021-06-16T20:24:19Z

My code doesn't (yet) populate those new columns - that's what the save=False parameter is going to do.

I'm pushing this live to staging and then I'll track down a bunch of interesting examples - locations with multiple reports and source locations - that I can use to demonstrate what the derive_availability_and_inventory() method is currently doing.

simonw · 2021-06-16T20:25:47Z

Once I'm comfortable with the behaviour of that method, I'll do the following:

Hook up the save=True parameter to populate those new database columns
Call that method any time a new report is filed, an updated source location is ingested or a new source location is matched
Add an API method we can call to apply that method to one (or maybe multiple) locations
Write a script that uses that new API method to backfill all of the data
Start returning the data from our various APIs

simonw · 2021-06-16T21:09:19Z

I want to find good example locations for this - locations that have both source locations AND reports against them which cover vaccines offered and availability.

Problem: we don't seem to have any on staging. Here's a query:

 with last_1000_vaccine_source_locations as (
  select * from source_location where json_extract_path(import_json::json, 'inventory') is not null
  order by id desc limit 1000
),
all_reports_with_vaccines as (
  select * from report where vaccines_offered is not null
)
select * from location where id in (
  select matched_location_id from last_1000_vaccine_source_locations
) and id in (
  select location_id from all_reports_with_vaccines
) limit 100

On staging, 0 results: https://vial-staging.calltheshots.us/dashboard/?sql=with+last_1000_vaccine_source_locations+as+(%0D%0A++select+*+from+source_location+where+json_extract_path(import_json%3A%3Ajson%2C+%27inventory%27)+is+not+null%0D%0A++order+by+id+desc+limit+1000%0D%0A)%2C%0D%0Aall_reports_with_vaccines+as+(%0D%0A++select+*+from+report+where+vaccines_offered+is+not+null%0D%0A)%0D%0Aselect+*+from+location+where+id+in+(%0D%0A++select+matched_location_id+from+last_1000_vaccine_source_locations%0D%0A)+and+id+in+(%0D%0A++select+location_id+from+all_reports_with_vaccines%0D%0A)+limit+10%3AkILYiM3iSoJ-PPLcMdgyz3RPr2xnMMh1UWzgBnvXRcg

On production, 65: https://vial.calltheshots.us/dashboard/?sql=with+last_1000_vaccine_source_locations+as+%28%0D%0A++select+%2A+from+source_location+where+json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+is+not+null%0D%0A++order+by+id+desc+limit+1000%0D%0A%29%2C%0D%0Aall_reports_with_vaccines+as+%28%0D%0A++select+%2A+from+report+where+vaccines_offered+is+not+null%0D%0A%29%0D%0Aselect+%2A+from+location+where+id+in+%28%0D%0A++select+matched_location_id+from+last_1000_vaccine_source_locations%0D%0A%29+and+id+in+%28%0D%0A++select+location_id+from+all_reports_with_vaccines%0D%0A%29+limit+100%3AzRfr7zN9tlDxD50JIB_mKYd5XEzMQ3BinNCBS3RMBNc

Since the code I've written so far is completely safe - it shows things on a debug page but doesn't update any database records - I'm going to ship it to production in order to see more examples.

simonw · 2021-06-17T19:41:20Z

My biggest question here is around the trustworthiness of our scrapers. We don't want to discard information from a high-
trustworthy source because some other scraper submitted a more recent source location containing less useful data.

I think the fix for that is going to be allow-listing the scrapers - maybe even start with only vaccinefinder_gov scrapers contributing to the result here. That's a small code change that can happen here:

vial/vaccinate/core/models.py

Lines 527 to 532 in 344153e

    
           most_recent_source_location_on_vaccines_offered = ( 
        
               self.matched_source_locations.all() 
        
               .order_by("-last_imported_at") 
        
               .exclude(import_json__inventory=None) 
        
               .first() 
        
           )

vial/vaccinate/core/models.py

Lines 594 to 599 in 344153e

    
           most_recent_source_location_on_availability = ( 
        
               self.matched_source_locations.all() 
        
               .order_by("-last_imported_at") 
        
               .exclude(import_json__availability=None) 
        
               .first() 
        
           )

I can use the new debug information on /location/x to confirm if this is a good idea or not, by first identifying good examples of locations that have different source location source names with different opinions on vaccines and availability.

simonw · 2021-06-17T19:45:03Z

https://vial.calltheshots.us/dashboard/?sql=select+location.public_id%2C+count%28%2A%29%2C+array_agg%28source_location.source_name%29%2C+count%28distinct+source_location.source_name%29+as+source_name_count%0D%0Afrom+source_location+join+location+on+source_location.matched_location_id+%3D+location.id%0D%0Agroup+by+location.public_id%0D%0Ahaving+count%28%2A%29+%3E+1%0D%0Aorder+by+source_name_count+desc%3AyJHHgxP6WhS5hZOPmDJxJznetZsjW3n4W6rXXvgiYHs

select location.public_id, count(*), array_agg(source_location.source_name), count(distinct source_location.source_name) as source_name_count
from source_location join location on source_location.matched_location_id = location.id
group by location.public_id
having count(*) > 1
order by source_name_count desc

Here's a top result from that: https://vial.calltheshots.us/location/lqwzd

simonw · 2021-06-17T19:49:53Z

This variant of that query returns only locations that also have at least one non-skip report:

select
  location.public_id,
  count(*) as num_source_locations,
  array_agg(distinct source_location.source_name),
  count(distinct source_location.source_name) as num_distinct_source_names
from
  source_location
  join location on source_location.matched_location_id = location.id
where
  -- Only locations that have at least one non-skip report
  location.id in (
    select location_id from report where report.id not in (select report_id from call_report_availability_tag where availabilitytag_id = 20)
  )
group by
  location.public_id
having count(*) > 1
order by num_distinct_source_names desc

https://vial.calltheshots.us/dashboard/?sql=select%0D%0A++location.public_id%2C%0D%0A++count%28%2A%29+as+num_source_locations%2C%0D%0A++array_agg%28distinct+source_location.source_name%29%2C%0D%0A++count%28distinct+source_location.source_name%29+as+num_distinct_source_names%0D%0Afrom%0D%0A++source_location%0D%0A++join+location+on+source_location.matched_location_id+%3D+location.id%0D%0Awhere%0D%0A++--+Only+locations+that+have+at+least+one+non-skip+report%0D%0A++location.id+in+%28%0D%0A++++select+location_id+from+report+where+report.id+not+in+%28select+report_id+from+call_report_availability_tag+where+availabilitytag_id+%3D+20%29%0D%0A++%29%0D%0Agroup+by%0D%0A++location.public_id%0D%0Ahaving+count%28%2A%29+%3E+1%0D%0Aorder+by+num_distinct_source_names+desc%3ABvQBhHy5ELiTQIN_jTAcxtYxRjQfYn8EtUUJUMuru-I

simonw · 2021-06-17T19:52:29Z

https://vial.calltheshots.us/location/lykhz is an interesting example:

simonw · 2021-06-17T19:57:01Z

https://vial.calltheshots.us/dashboard/?sql=select%20%22json_extract_path%22%2C%20count%28%2A%29%20as%20n%20from%20%28select%20json_extract_path%28import_json%3A%3Ajson%2C%20%27availability%27%29%3A%3Atext%20from%20source_location%20where%20source_name%20%3D%20%27getmyvax_org%27%29%20as%20results%20group%20by%20%22json_extract_path%22%20order%20by%20n%20desc%3APE6S3PfuMROXLq15I658NslW4tlc2PMasbC3HeyDqBI confirms that getmyvax_org has NEVER returned a "dropins" key, it only ever populates appointments:

Running this against the DB:

select "json_extract_path", count(*) as n from (select json_extract_path(import_json::json, 'availability')::text from source_location ) as results group by "json_extract_path" order by n desc

Returns this:

NULL	94952
{"appointments": true}	75839
{"drop_in": false}	41698
{"appointments": false}	12170
{"drop_in": true}	8684
{}	5143
{"drop_in": false, "appointments": true}	1288
{"drop_in": false, "appointments": false}	384
{"drop_in": true, "appointments": true}	102
{"drop_in": true, "appointments": false}	16

So the drop_in key is set to true for 8,000+ locations.

Maybe if the most recent source location has no explicit opinion on drop-ins we should fall back to the most recent report, if one exists?

simonw · 2021-06-17T19:57:46Z

I'm going to upgrade the display of that derived data on the /location page to make it much easier to interpret.

simonw · 2021-06-17T21:16:18Z

Much better: https://vial-staging.calltheshots.us/location/recUDJ9KD91QWRqwb

simonw · 2021-06-17T23:00:57Z

Examples I need to find:

A location with many reports and no source locations
A location with many source locations and no reports
A location with a mix of source locations and reports where a report is most recent
A location with a mix of source locations and reports where a source location is most recent

simonw · 2021-06-17T23:13:20Z

https://vial.calltheshots.us/location/lkqqg has a LOT of source locations from "prepmod", no reports
https://vial.calltheshots.us/location/lryyxv source locations from wyo_appt_portal
https://vial.calltheshots.us/location/lggqr has many il_juvare source locations
https://vial.calltheshots.us/location/ltfww has information from reports and source locations, most recent report wins
https://vial.calltheshots.us/location/lktcx has a vaccinefinder_org source location

And some reports examples using the query from #650 (comment)

https://vial.calltheshots.us/location/recHFn0LjJaLeM8Gi - a getmyvax source location wins, there are reports too
https://vial.calltheshots.us/location/lrgrq - report wins for vaccines, getmyvax_org source location wins for availability

Variant of above query:

select
  location.public_id,
  count(*) as num_source_locations,
  array_agg(distinct source_location.source_name),
  count(distinct source_location.source_name) as num_distinct_source_names
from
  source_location
  join location on source_location.matched_location_id = location.id
where
  -- Only locations that have at least one non-skip report
  location.id in (
    select location_id from report where report.id not in (select report_id from call_report_availability_tag where availabilitytag_id = 20)
  )
group by
  location.public_id
having count(*) > 1
and 'vaccinefinder_org' = any(array_agg(distinct source_location.source_name))
order by num_distinct_source_names desc

The and 'vaccinefinder_org' = any(array_agg(distinct source_location.source_name)) bit in the HAVING clause returns locations that have at least one vaccinefinder_org.

https://vial.calltheshots.us/location/recH9cpfiexxM8WkN - mixes source location and report.

simonw · 2021-06-21T21:27:34Z

I'm going to wrap up this work up by:

Selecting some "trusted" source location scrapers as a starting point - based on https://vial.calltheshots.us/dashboard/?sql=select+%22source_name%22%2C+count%28%2A%29+as+n+from+%28with+recent+as+%28select+%2A+from+source_location%29%0D%0Aselect+source_name+from+recent%29+as+results+group+by+%22source_name%22+order+by+n+desc%3A7oqK-zE1sHTY7KGKRYW1-y3q2y8TlMalmlTIAE8vcLc and conversation with Rachel I'm going to use vaccinefinder_org and vaccinespotter_org and getmyvax_org, though maybe should use us_carbon_health too?
Implementing the save=True option on the location.derive_availability_and_inventory() method
Hooking it up so new reports, new source locations and matched source locations call that method
Building an API endpoint for bulk calling derive_availability_and_inventory(save=True) on a bunch of locations
Running a script to trigger that API endpoint for every existing location
Update various exports to use the new fields (probably in a separate issue)

simonw · 2021-06-21T21:30:16Z

It looks like us_carbon_health doesn't populate either inventory or availability, see https://vial.calltheshots.us/dashboard/?sql=select%0D%0A++last_imported_at%2C%0D%0A++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%29+as+availability%2C%0D%0A++json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+as+inventory%0D%0Afrom+source_location%0D%0Awhere+source_name+%3D+%27us_carbon_health%27%0D%0Aorder+by+id+desc%0D%0Alimit+1000%3A2YDNN-RMQeRDRCIpqP4YznKqND7SV9_sYDCNqaqlVRQ - so I'll skip it for the moment.

simonw · 2021-06-21T21:32:14Z

vaccinespotter_org never populates inventory: https://vial.calltheshots.us/dashboard/?sql=select%0D%0A++last_imported_at%2C%0D%0A++json_extract_path%28import_json%3A%3Ajson%2C+%27availability%27%29+as+availability%2C%0D%0A++json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+as+inventory%0D%0Afrom+source_location%0D%0Awhere+source_name+%3D+%27vaccinespotter_org%27%0D%0Aand+json_extract_path%28import_json%3A%3Ajson%2C+%27inventory%27%29+is+not+null%0D%0Aorder+by+id+desc%0D%0Alimit+1000%3ABPU5RE0yYS8hHPLxjUJgFP3NK5lBKjIRHHR3Sgqj22A

Both vaccinefinder_org and getmyvax_org populate both inventory and availability.

simonw · 2021-06-21T21:34:24Z

The fact that vaccinespotter_org doesn't populate inventory does not mean we need to exclude it - this code here only considers source locations that DID populate inventory:

vial/vaccinate/core/models.py

Lines 541 to 546 in 9b7cc30

    
           most_recent_source_location_on_vaccines_offered = ( 
        
               self.matched_source_locations.all() 
        
               .order_by("-last_imported_at") 
        
               .exclude(import_json__inventory=None) 
        
               .first() 
        
           )

…n deriving data, refs #650

…650

simonw · 2021-06-21T22:15:10Z

Just realized I need to exclude reports with is_pending_review=True as well. (Already excluding soft_deleted.)

simonw · 2021-06-22T00:26:20Z

I need to do another in-depth review of places that might add/remove/edit source locations and reports to make sure they all update the derived data correctly.

simonw · 2021-06-28T16:22:57Z

Here's a progress report on how population is going based on imported source locations and reports:

https://vial.calltheshots.us/dashboard/?sql=select+count%28%2A%29+from+location+where+vaccines_offered_last_updated_at+is+not+null%3Arp-wF28ywmDjfXh-Ie0-SSYdjkFpFOIr79xGimH7BTU&sql=select+count%28%2A%29+from+location+where+appointments_walkins_last_updated_at+is+not+null%3A8e3BMBVMaNJWTi34eaHVsk1uTJB_jPJI0Hge3VJE4kM

select count(*) from location where vaccines_offered_last_updated_at is not null says 13387

select count(*) from location where appointments_walkins_last_updated_at is not null says 13759

That's out of 77,067 not-soft-deleted locations.

simonw · 2021-06-28T16:24:06Z

I still need to run a back-fill mechanism for locations that haven't had a report or a source location import in the past week.

simonw · 2021-06-28T16:25:17Z

On staging https://vial-staging.calltheshots.us/dashboard/?sql=select+count%28%2A%29+from+location+where+vaccines_offered_last_updated_at+is+not+null%3Ax05v1QiDVM1fIntikQuIJSI7Umqolhu3YXvrdJsdPiU&sql=select+count%28%2A%29+from+location+where+appointments_walkins_last_updated_at+is+not+null%3AZPXVjIp8wjdVr8xjePGpvlDD4GyWUDSgnB8rbZbuPoQ both return around 9,000 records, presumably due to test source location imports run against staging.

…efs #705, refs #650

simonw added db-models Anything involving our Django database models importers Tools that import data into VIAL labels Jun 9, 2021

simonw mentioned this issue Jun 9, 2021

Mechanism for populating new location vaccines_offered/accepts_appointments/accepts_walkins from VaccineFinder #504

Closed

simonw mentioned this issue Jun 9, 2021

Walgreens with 63 matched source locations looks like bad data #651

Open

simonw mentioned this issue Jun 10, 2021

Tool showing all possible opinions about a location #653

Open

simonw added a commit that referenced this issue Jun 16, 2021

New columns to track how location fields were populated, refs #650

2a9792f

simonw added a commit that referenced this issue Jun 16, 2021

loaction.derive_availability_and_inventory() method, refs #650

c356992

simonw added a commit that referenced this issue Jun 16, 2021

Add derived_availability_and_inventory() debug info to /location/X pa…

487bb74

…ge, refs #650

simonw added a commit that referenced this issue Jun 16, 2021

Add those new fields to the 'Data Fields' admin section, refs #650

23eb061

simonw added a commit that referenced this issue Jun 16, 2021

Show timeline of availability/vaccines on /location/x page, refs #650

1da8ace

simonw added a commit that referenced this issue Jun 17, 2021

Much improved display of derived data on /location/x, refs #650

0359ec1

simonw added a commit that referenced this issue Jun 21, 2021

Only consider vaccinefinder_org, vaccinespotter_org, getmyvax_org whe…

5684ffd

…n deriving data, refs #650

simonw added a commit that referenced this issue Jun 21, 2021

location.derive_availability_and_inventory(save=True) parameter, refs #…

a4ded48

…650

simonw added a commit that referenced this issue Jun 22, 2021

Update derived fields when report submitted, refs #650

e1bc314

simonw added a commit that referenced this issue Jun 22, 2021

flake8 fix, refs #650

03d77ee

simonw added a commit that referenced this issue Jun 22, 2021

Update derived location fields when source location imported, refs #650

9cb542e

simonw mentioned this issue Jun 29, 2021

Partner request for API: appointment websites, vaccine types, appointment availability and structured hours information #705

Closed

4 tasks

simonw added a commit that referenced this issue Jun 30, 2021

APIv0 now popultes vaccines_offered from location.vaccines_offered, r…

27bcc4e

…efs #705, refs #650

Populate Location fields: vaccines_offered, accepts_appointments and accepts_walkins #650

Populate Location fields: vaccines_offered, accepts_appointments and accepts_walkins #650

Comments

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021 • edited Loading

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021 • edited Loading

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021 • edited Loading

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021

simonw commented Jun 9, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021 • edited Loading

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 16, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021

simonw commented Jun 17, 2021 • edited Loading

simonw commented Jun 21, 2021 • edited Loading

simonw commented Jun 21, 2021

simonw commented Jun 21, 2021

simonw commented Jun 21, 2021

simonw commented Jun 21, 2021 • edited Loading

simonw commented Jun 22, 2021

simonw commented Jun 28, 2021 • edited Loading

simonw commented Jun 28, 2021

simonw commented Jun 28, 2021

simonw commented Jun 9, 2021 •

edited

Loading

simonw commented Jun 9, 2021 •

edited

Loading

simonw commented Jun 9, 2021 •

edited

Loading

simonw commented Jun 16, 2021 •

edited

Loading

simonw commented Jun 17, 2021 •

edited

Loading

simonw commented Jun 21, 2021 •

edited

Loading

simonw commented Jun 21, 2021 •

edited

Loading

simonw commented Jun 28, 2021 •

edited

Loading