For that the steps will be:
- Create a new Materialized View (Pipe and Data Source) to add the new column.
- Backfill
- Connect endpoints to the new Data Source
- Delete old Pipe and Data Source
analytics_sessions_mv1.datasource
:
SCHEMA >
`date` Date,
`session_id` String,
+ `version` SimpleAggregateFunction(any, String),
`device` SimpleAggregateFunction(any, String),
`browser` SimpleAggregateFunction(any, String),
`location` SimpleAggregateFunction(any, String),
`first_hit` SimpleAggregateFunction(min, DateTime),
`latest_hit` SimpleAggregateFunction(max, DateTime),
`hits` AggregateFunction(count)
ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, session_id"
analytics_sessions1.pipe
SELECT
toDate(timestamp) AS date,
session_id,
+ anySimpleState(version) AS version,
anySimpleState(device) AS device,
anySimpleState(browser) AS browser,
anySimpleState(location) AS location,
minSimpleState(timestamp) AS first_hit,
maxSimpleState(timestamp) AS latest_hit,
countState() AS hits
FROM analytics_hits
WHERE timestamp > '2024-05-27 11:00:00'
GROUP BY
date,
session_id
Note we are filtering by a date in the future to be able to perform a backfill operation.
Additionally we create a Pipe only for backfilling purposes, see analytics_sessions1_backfill.pipe
in the PR
While this can be automated using a custom deployment that runs the backfill operation we recommend it to just do it manually. So once the previous PR has been merged and the backfill DateTime (in this case '2024-05-27 11:00:00') has passed, you can run this command on your main Workspace:
tb pipe populate analytics_sessions1_backfill --node analytics_sessions_1_1_backfill --wait
Additionally you can run some query with tb sql
over both Data Sources to check for data quality.
Once you've validated data quality on both Data Sources, you can make the Pipe endpoints to use the new analytics_sessions_1
Data Source.
Once the changes are in the main Workspace and running in production you can get rid of old resources by just using git rm
, create a git branch and merge it.