Skip to content

Commit 39469e2

Browse files
authored
Update Rollups example script (#1022)
Assume we run the cron jobs on an hour basis, the following will happen: | timeline|runs_at|last_rollup_time|curr_rollup_time|what happens| |-------------|-------------|-------------|-------------|-------------| |the Nth cron job runs|2021-01-05 20:00:00|2021-01-05 19:00:00|2021-01-05 20:00:00|hours included [20]| | new http_requests are created at 2021-01-05 20:30:00, let's say http_requests (A and B)|||| |the (N+1)th cron job |2021-01-05 21:00:00|2021-01-05 20:00:00|2021-01-05 21:00:00|hours included [21]| Events A and B will not be added to the rollup table. To confirm: ```SQL > select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 19:00:00'::timestamp, '2021-01-05 20:00:00'::timestamp, '(]'); > true ``` Events A, and B won't be added here although their `ingest_time`s are within the range, because **they are not in the citus DB yet**, they are created at `20:30:00` and the job is running at `20:00:00`. That's 30 minutes in between. ```SQL > select date_trunc('hour', '2021-01-05 20:30:00'::timestamp) <@ tsrange('2021-01-05 20:00:00'::timestamp, '2021-01-05 21:00:00'::timestamp, '(]'); > false ``` Events A, and B won't be added here because their `ingest_time`s are not within the range. Which means that http_requests A and B will be lost. I think the issue can be fixed if we use the timestamp itself without any truncation in the `where` clause, so instead of using `date_trunc('minute', ingest_time)` we just use `ingest_time`.
1 parent c6ed6fc commit 39469e2

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

use_cases/realtime_analytics.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,7 @@ The following function wraps the rollup query up for convenience.
186186
SUM(response_time_msec) / COUNT(1) AS average_response_time_msec
187187
FROM http_request
188188
-- roll up only data new since last_rollup_time
189-
WHERE date_trunc('minute', ingest_time) <@
190-
tstzrange(last_rollup_time, curr_rollup_time, '(]')
189+
WHERE ingest_time <@ tstzrange(last_rollup_time, curr_rollup_time, '(]')
191190
GROUP BY 1, 2;
192191
193192
-- update the value in latest_rollup so that next time we run the

0 commit comments

Comments
 (0)