Open
Description
I know this is not an issue for this repo but I have looked in a lot of places and tried a lot of different things and nothing is working.
I need to bulk update deleted_at
for records for a table with 3M rows and I need to look at 2 other tables that both have 5-10M records.
My query is
UPDATE public.study_data_events sde
SET deleted_at = current_timestamp
WHERE sde.id
IN (
SELECT s.id
FROM public.study_data_events s
LEFT JOIN public.study_data_values sdv on sdv.study_data_event_id = s.id and sdv.deleted_at is NULL
LEFT JOIN public.intervention_phase_events ipe on ipe.intervention_event_id = s.id and ipe.deleted_at is NULL
WHERE sdv.id is NULL
AND ipe.id is NULL
AND s.deleted_at is NULL
);
The EXPLAIN looks like
Update on study_data_events sde (cost=48822.30..48822.62 rows=1 width=132)
-> Nested Loop (cost=48822.30..48822.62 rows=1 width=132)
-> HashAggregate (cost=48821.87..48821.88 rows=1 width=22)
Group Key: s.id
-> Nested Loop Left Join (cost=36390.15..48821.87 rows=1 width=22)
Filter: (sdv.id IS NULL)
-> Hash Right Join (cost=36389.71..48817.69 rows=1 width=16)
Hash Cond: (ipe.intervention_event_id = s.id)
Filter: (ipe.id IS NULL)
-> Seq Scan on intervention_phase_events ipe (cost=0.00..5637.57 rows=270448 width=14)
Filter: (deleted_at IS NULL)
-> Hash (cost=29704.09..29704.09 rows=384610 width=10)
-> Seq Scan on study_data_events s (cost=0.00..29704.09 rows=384610 width=10)
Filter: (deleted_at IS NULL)
-> Index Scan using study_data_values_study_data_event_id_idx on study_data_values sdv (cost=0.43..3.31 rows=87 width=14)
Index Cond: (study_data_event_id = s.id)
Filter: (deleted_at IS NULL)
-> Index Scan using study_data_events_pkey on study_data_events sde (cost=0.42..0.72 rows=1 width=106)
Index Cond: (id = s.id)
We have this seq scan
-> Seq Scan on study_data_events s (cost=0.00..29704.09 rows=384610 width=10)
Filter: (deleted_at IS NULL)`
that is very costly and I'm wondering what I can do about it.
I've tried adding a variety of indices on s.deleted_at
none of which have helped. Has anyone needed to filter a large dataset on deleted_at IS NULL
?