Skip to content

Commit 631dda8

Browse files
Include prototype for incremental refresh
The initial idea behind this work was to create incremental *parallel* refresh, however, there still are explicit locks being held on CAgg hypertables during refresh, so it has become *only* incremental refresh. We introduce 3 concepts: - producer job - consumer job - work queue By having these items separate, we can schedule work for CAgg refreshes in smaller increments (say 12 hours instead of 3 weeks), yet also allow us to intervene by injecting higher priority refreshes if needed. For details, see README.md
1 parent daf13d7 commit 631dda8

File tree

4 files changed

+564
-0
lines changed

4 files changed

+564
-0
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Continous Aggregates, incremental parallel setup
2+
3+
This code is exploring the possibilities to do incremental CAgg refreshes in
4+
parallel. The setup it uses is as following.
5+
6+
At a very high level these are the components:
7+
8+
- a table that acts as a work queue:
9+
`_timescaledb_additional.incremental_continuous_aggregate_refreshes`
10+
- one (or more) producer jobs that schedule CAgg refreshes
11+
- one (or more) consumer jobs that process the jobs based on priority
12+
13+
The producer jobs can be scheduled very frequently, as no duplicate tasks will
14+
be written to the work queue.
15+
16+
## Producer
17+
18+
We have a producer procedure
19+
(`schedule_refresh_continuous_aggregate_incremental`), which schedules tasks to
20+
be picked up by the consumers.
21+
22+
The configuration for this call contains the following keys:
23+
24+
```json
25+
{
26+
"end_offset": "similar to end-offset in the policy",
27+
"start_offset": "similar to start-offset in the policy",
28+
"continuous_aggregate": "regclass / fully qualified name of the user view for the CAgg",
29+
"increment_size": "the size of each individual task, default: chunk_interval",
30+
"priority": "priority for these tasks. Lower numbers get processed earlier, default: 100"
31+
}
32+
```
33+
34+
### Producer Examples
35+
36+
#### Schedule multiple jobs for this cagg, with increments of 1 week
37+
38+
We schedule 2 sets
39+
40+
```sql
41+
CALL _timescaledb_additional.schedule_refresh_continuous_aggregate_incremental(
42+
job_id => null,
43+
config => '
44+
{
45+
"end_offset": "6 weeks",
46+
"start_offset": "3 years",
47+
"continuous_aggregate": "public.test_cagg_incr_refresh_cagg",
48+
"increment_size": "3 days"
49+
}');
50+
```
51+
52+
with the most recent data having the highest priority:
53+
54+
```sql
55+
CALL _timescaledb_additional.schedule_refresh_continuous_aggregate_incremental(
56+
job_id => null,
57+
config => '
58+
{
59+
"end_offset": "1 day",
60+
"start_offset": "6 weeks",
61+
"continuous_aggregate": "public.test_cagg_incr_refresh_cagg",
62+
"increment_size": "1 week",
63+
"priority": 1
64+
}');
65+
```
66+
67+
## Consumer
68+
69+
For the consumer(s), we schedule as many jobs as we want to be able to run in
70+
parallel. Likely, a reasonable maximum for these is not too high, for example,
71+
4-6. While we *can* do incremental CAgg refreshes, we cannot (as of december
72+
2024) schedule parallel refreshes for the same CAgg. This should therefore never
73+
be higher than your number of CAggs.
74+
75+
These jobs will be consuming a connection all the time, as they are designed to
76+
run all the time.
77+
78+
```sql
79+
SELECT
80+
public.add_job(
81+
proc => '_timescaledb_additional.task_refresh_continuous_aggregate_incremental_runner'::regproc,
82+
-- This isn't really needed, but this ensures the workers do not run forever,
83+
-- but once they terminate, they will be restarted within 15 minutes or so.
84+
schedule_interval => interval '15 minutes',
85+
config => '{"max_runtime": "11 hours"}',
86+
initial_start => now()
87+
)
88+
FROM
89+
generate_series(1, 4);
90+
```
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
2+
DROP PROCEDURE IF EXISTS _timescaledb_additional.task_refresh_continuous_aggregate_incremental_runner;
3+
CREATE PROCEDURE _timescaledb_additional.task_refresh_continuous_aggregate_incremental_runner (
4+
job_id int,
5+
config jsonb
6+
) LANGUAGE plpgsql AS $BODY$
7+
DECLARE
8+
max_runtime interval := (config->>'max_runtime')::interval;
9+
global_start_time timestamptz := pg_catalog.clock_timestamp();
10+
global_end_time timestamptz;
11+
app_name text;
12+
BEGIN
13+
max_runtime := coalesce(max_runtime, interval '6 hours');
14+
global_end_time := global_start_time + max_runtime;
15+
16+
WHILE pg_catalog.clock_timestamp() < global_end_time LOOP
17+
SET search_path TO 'pg_catalog,pg_temp';
18+
SET lock_timeout TO '3s';
19+
SET application_name TO 'cagg incremental refresh consumer - idle';
20+
21+
-- Prevent a hot loop
22+
PERFORM pg_catalog.pg_sleep(1.0);
23+
24+
SET application_name TO 'cagg incremental refresh consumer - retrieving new task';
25+
26+
DECLARE
27+
p_id bigint;
28+
p_cagg regclass;
29+
p_window_start timestamptz;
30+
p_window_end timestamptz;
31+
p_start_time timestamptz;
32+
p_end_time timestamptz;
33+
p_mat_hypertable_id int;
34+
p_job_id int;
35+
BEGIN
36+
SELECT
37+
q.id,
38+
q.continuous_aggregate,
39+
q.window_start,
40+
q.window_end,
41+
cagg.mat_hypertable_id,
42+
coalesce(jobs.job_id, -1)
43+
INTO
44+
p_id,
45+
p_cagg,
46+
p_window_start,
47+
p_window_end,
48+
p_mat_hypertable_id,
49+
p_job_id
50+
FROM
51+
_timescaledb_additional.incremental_continuous_aggregate_refreshes AS q
52+
JOIN
53+
pg_catalog.pg_class AS pc ON (q.continuous_aggregate=oid)
54+
JOIN
55+
pg_catalog.pg_namespace AS pn ON (relnamespace=pn.oid)
56+
JOIN
57+
_timescaledb_catalog.continuous_agg AS cagg ON (cagg.user_view_schema=nspname AND cagg.user_view_name=pc.relname)
58+
JOIN
59+
_timescaledb_catalog.hypertable AS h ON (cagg.mat_hypertable_id=h.id)
60+
LEFT JOIN
61+
timescaledb_information.jobs ON (proc_name='policy_refresh_continuous_aggregate' AND proc_schema='_timescaledb_functions' AND jobs.config->>'mat_hypertable_id' = cagg.mat_hypertable_id::text)
62+
WHERE
63+
q.worker_pid IS NULL AND q.finished IS NULL
64+
-- We don't want multiple workers to be active on the same CAgg,
65+
AND NOT EXISTS (
66+
SELECT
67+
FROM
68+
_timescaledb_additional.incremental_continuous_aggregate_refreshes AS a
69+
JOIN
70+
pg_catalog.pg_stat_activity ON (pid=worker_pid)
71+
WHERE
72+
a.finished IS NULL
73+
-- If pids ever get recycled (container/machine restart),
74+
-- this filter ensures we ignore the old ones
75+
AND started > backend_start
76+
AND q.continuous_aggregate = a.continuous_aggregate
77+
)
78+
ORDER BY
79+
q.priority ASC,
80+
q.scheduled ASC
81+
FOR NO KEY UPDATE OF q SKIP LOCKED
82+
LIMIT
83+
1;
84+
85+
IF p_cagg IS NULL THEN
86+
COMMIT;
87+
-- There are no items in the queue that we can currently process. We therefore
88+
-- sleep a while longer before attempting to try again.
89+
IF global_end_time - interval '30 seconds' < now () THEN
90+
EXIT;
91+
ELSE
92+
SET application_name TO 'cagg incremental refresh consumer - waiting for next task';
93+
PERFORM pg_catalog.pg_sleep(0.1);
94+
CONTINUE;
95+
END IF;
96+
END IF;
97+
98+
UPDATE
99+
_timescaledb_additional.incremental_continuous_aggregate_refreshes
100+
SET
101+
worker_pid = pg_backend_pid(),
102+
started = clock_timestamp()
103+
WHERE
104+
id = p_id;
105+
106+
-- Inform others of what we are doing.
107+
app_name := ' refresh ' || p_window_start::date;
108+
IF p_window_end::date != p_window_start::date THEN
109+
app_name := app_name || ' ' || p_window_end::date;
110+
ELSE
111+
app_name := app_name || to_char(p_window_start, 'THH24:MI');
112+
END IF;
113+
IF length(app_name) + length(p_cagg::text) > 63 THEN
114+
app_name := '...' || right(p_cagg::text, 60 - length(app_name)) || app_name;
115+
ELSE
116+
app_name := p_cagg::text || app_name;
117+
END IF;
118+
PERFORM pg_catalog.set_config(
119+
'application_name',
120+
app_name,
121+
false
122+
);
123+
124+
RAISE NOTICE
125+
'% - Processing %, (% - %)',
126+
pg_catalog.to_char(pg_catalog.clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.FF3OF'),
127+
p_cagg,
128+
p_window_start,
129+
p_window_end;
130+
131+
-- We need to ensure that all other workers now know we are working on this
132+
-- task. We therefore need to commit once now. This also releases our
133+
-- access exclusive lock on the queue table.
134+
COMMIT;
135+
136+
-- We take out a row-level-lock to signal to concurrent workers that *we*
137+
-- are working on it. By taking this type of lock, we can clean up
138+
-- this table from different tasks: They can update/delete these rows
139+
-- if no active worker is working on them, and no lock is established.
140+
PERFORM
141+
FROM
142+
_timescaledb_additional.incremental_continuous_aggregate_refreshes
143+
WHERE
144+
id = p_id
145+
FOR NO KEY UPDATE;
146+
147+
CALL _timescaledb_functions.policy_refresh_continuous_aggregate(
148+
-1,
149+
config => jsonb_build_object(
150+
'end_offset', (clock_timestamp() - p_window_end)::interval(0),
151+
'start_offset', (clock_timestamp() - p_window_start)::interval(0),
152+
'mat_hypertable_id', p_mat_hypertable_id
153+
)
154+
);
155+
156+
UPDATE
157+
_timescaledb_additional.incremental_continuous_aggregate_refreshes
158+
SET
159+
finished = clock_timestamp()
160+
WHERE
161+
id = p_id;
162+
COMMIT;
163+
164+
SET application_name TO 'cagg incremental refresh consumer - idle';
165+
END;
166+
END LOOP;
167+
168+
RAISE NOTICE 'Shutting down worker, as we exceeded our maximum runtime (%)', max_runtime;
169+
END;
170+
$BODY$;
171+
172+
GRANT EXECUTE ON PROCEDURE _timescaledb_additional.task_refresh_continuous_aggregate_incremental_runner TO pg_database_owner;

0 commit comments

Comments
 (0)