Skip to content

Commit fd12319

Browse files
author
Rehan van der Merwe
authored
feat: remove manual partitioning #2
2 parents 4412136 + cdc0873 commit fd12319

File tree

9 files changed

+21
-270
lines changed

9 files changed

+21
-270
lines changed

README.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,6 @@ You can see an example implementation of the demo site [here](https://github.com
100100

101101
### Client side setup
102102

103-
> ⚠️ IMPORTANT! **After** the client sent the first page data, you have to click on the "Add Partitions" button in the
104-
> frontend to auto-discover and add the site, month and day partitions. Otherwise, the data will not show up in the charts.
105-
> This operation has to be repeated at the beginning of every month.
106-
107103
Install the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client):
108104
```
109105
npm install serverless-website-analytics-client
@@ -169,17 +165,13 @@ This is a Lambda-lith hit through the Lambda Function URLs (FURL) by reverse pro
169165
in TypeScript and uses [tRPC](https://trpc.io/) to handle API requests.
170166

171167
The Queries to Athena are synchronous, the connection timeout between CloudFront and the FURL has been increased
172-
to 60 seconds.
168+
to 60 seconds. Partitions are dynamic, they do not need to be added manually.
173169

174170
There are three available authentication configurations:
175171
- **None**, it is open to the public
176172
- **Basic Authentication**, basic protection for the index.html file
177173
- **AWS Cogntio**, recommended for production
178174

179-
⚠️ Partitions are not automatically created in Athena, they have to be created manually by the user by clicking the
180-
"Create/Refresh Partitions" button in the frontend. This has to be done whenever a new site is added or a new month
181-
starts.
182-
183175
### Ingestion API
184176

185177
Similarly to the backend, it is also a TS Lambda-lith that is hit through the FURL by reverse proxying through CloudFront.

docs/API.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,6 @@ You can see an example implementation of the demo site [here](https://github.com
100100

101101
### Client side setup
102102

103-
> ⚠️ IMPORTANT! **After** the client sent the first page data, you have to click on the "Add Partitions" button in the
104-
> frontend to auto-discover and add the site, month and day partitions. Otherwise, the data will not show up in the charts.
105-
> This operation has to be repeated at the beginning of every month.
106-
107103
Install the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client):
108104
```
109105
npm install serverless-website-analytics-client
@@ -169,17 +165,13 @@ This is a Lambda-lith hit through the Lambda Function URLs (FURL) by reverse pro
169165
in TypeScript and uses [tRPC](https://trpc.io/) to handle API requests.
170166

171167
The Queries to Athena are synchronous, the connection timeout between CloudFront and the FURL has been increased
172-
to 60 seconds.
168+
to 60 seconds. Partitions are dynamic, they do not need to be added manually.
173169

174170
There are three available authentication configurations:
175171
- **None**, it is open to the public
176172
- **Basic Authentication**, basic protection for the index.html file
177173
- **AWS Cogntio**, recommended for production
178174

179-
⚠️ Partitions are not automatically created in Athena, they have to be created manually by the user by clicking the
180-
"Create/Refresh Partitions" button in the frontend. This has to be done whenever a new site is added or a new month
181-
starts.
182-
183175
### Ingestion API
184176

185177
Similarly to the backend, it is also a TS Lambda-lith that is hit through the FURL by reverse proxying through CloudFront.

docs/CONTRIBUTING.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,10 @@ npm run watch-local-api-ingest-watch
7575
## Highlights
7676

7777
### Record storage strategy
78-
Events/logs/records are stored in S3 in a partitioned manner. The partitioning is done by site, month and day by
79-
Kinesis Firehose. The records are stored in parquet format. We are currently using an Append Only Log (AOL) pattern.
80-
This means that we are never updating the logs, we are only adding new ones.
78+
Events/logs/records are stored in S3 in a partitioned manner. The partitioning is dynamic, so all that is left is to store
79+
the data correctly and that is done by Kinesis Firehose in the format of: site, month and day. The records are buffered
80+
and stored in parquet format. We are currently using an Append Only Log (AOL) pattern. This means that we are never
81+
updating the logs, we are only adding new ones.
8182

8283
In order to track the time the user has been on the page we do the following:
8384
- Create a unique `page_id` for the current page view

src/backendAnalytics.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,17 @@ export function backendAnalytics(scope: Construct, name: (name: string) => strin
202202
},
203203
],
204204
},
205+
parameters: {
206+
'projection.enabled': 'true',
207+
'projection.year.type': 'integer',
208+
'projection.year.range': '2023,3023',
209+
'projection.year.interval': '1',
210+
'projection.month.type': 'integer',
211+
'projection.month.range': '1,12',
212+
'projection.month.interval': '1',
213+
'projection.site.type': 'enum',
214+
'projection.site.values': props.sites.join(','),
215+
},
205216
},
206217
});
207218
glueTablePageViews.addDependency(glueDb);
Lines changed: 1 addition & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
11
import { z } from 'zod';
22
import { assertAuthentication, TrpcInstance } from '@backend/api-front/server';
3-
import { SchemaSite, SchemaSitePartitions, SitePartitions } from '@backend/lib/models/site';
3+
import { SchemaSite } from '@backend/lib/models/site';
44
import { LambdaEnvironment } from '@backend/api-front/environment';
5-
import { getAthenaClient, getS3Client } from '@backend/lib/utils/lazy_aws';
6-
import { orderBy } from 'lodash';
7-
import { DateUtils } from '@backend/lib/utils/date_utils';
8-
import { AthenaBase } from '@backend/lib/utils/athena_base';
95

106
export function sites(trpcInstance: TrpcInstance) {
117
return trpcInstance.procedure
@@ -17,96 +13,3 @@ export function sites(trpcInstance: TrpcInstance) {
1713
return LambdaEnvironment.SITES;
1814
});
1915
}
20-
21-
export function sitesGetPartitions(trpcInstance: TrpcInstance) {
22-
return trpcInstance.procedure
23-
.input(z.undefined())
24-
.output(SchemaSitePartitions)
25-
.query(async ({ ctx }) => {
26-
assertAuthentication(ctx);
27-
28-
const athenaClient = getAthenaClient();
29-
const s3Client = getS3Client();
30-
const athenaWrapper = new AthenaBase(
31-
athenaClient,
32-
s3Client,
33-
LambdaEnvironment.ANALYTICS_GLUE_DB_NAME,
34-
LambdaEnvironment.ANALYTICS_BUCKET_ATHENA_PATH
35-
);
36-
37-
const res = await athenaWrapper.query('SELECT * FROM "page_views$partitions" ORDER BY site, year, month');
38-
return res.data as SitePartitions;
39-
});
40-
}
41-
42-
async function getPartitions(athenaClient: AthenaBase) {
43-
return (await athenaClient.query('SELECT * FROM "page_views$partitions" ORDER BY site, year, month'))
44-
.data as SitePartitions;
45-
}
46-
export function sitesUpdatePartition(trpcInstance: TrpcInstance) {
47-
return trpcInstance.procedure
48-
.input(z.object({ forceRepair: z.boolean() }))
49-
.output(SchemaSitePartitions)
50-
.mutation(async ({ input, ctx }) => {
51-
assertAuthentication(ctx);
52-
53-
const athenaClient = getAthenaClient();
54-
const s3Client = getS3Client();
55-
const athenaWrapper = new AthenaBase(
56-
athenaClient,
57-
s3Client,
58-
LambdaEnvironment.ANALYTICS_GLUE_DB_NAME,
59-
LambdaEnvironment.ANALYTICS_BUCKET_ATHENA_PATH
60-
);
61-
62-
let partitions = await getPartitions(athenaWrapper);
63-
if (!partitions.length || input.forceRepair) {
64-
/* Auto discover/repair all indexes aka partitions */
65-
await athenaWrapper.query('MSCK REPAIR TABLE page_views');
66-
partitions = await getPartitions(athenaWrapper);
67-
}
68-
69-
if (input.forceRepair) {
70-
return partitions;
71-
}
72-
if (!partitions.length) {
73-
return [];
74-
}
75-
76-
const earliestPartition = orderBy(partitions, ['year', 'month'], ['asc', 'asc'])[0];
77-
78-
/* Get a list of date partitions between the first partition and now */
79-
const earliestPartitionDate = new Date(Date.UTC(earliestPartition.year, earliestPartition.month - 1));
80-
const now = DateUtils.now();
81-
const months = DateUtils.getMonthsBetweenDates(earliestPartitionDate, now);
82-
83-
let partitionsAdded = false;
84-
for (const site of LambdaEnvironment.SITES) {
85-
const partitionsToAdd = [];
86-
for (const month of months) {
87-
const hasPartition = partitions.find(
88-
(row) => row.site === site && row.year === month.getFullYear() && row.month === month.getMonth() + 1
89-
);
90-
if (hasPartition) {
91-
continue;
92-
}
93-
94-
partitionsToAdd.push(
95-
`PARTITION (site = '${site}', year = ${month.getFullYear()}, month = ${month.getMonth() + 1} )`
96-
);
97-
}
98-
99-
if (partitionsToAdd.length > 0) {
100-
partitionsAdded = true;
101-
// console.log(`ALTER TABLE page_views ADD IF NOT EXISTS ${partitionsToAdd.join("\n")};`)
102-
await athenaWrapper.query(`ALTER TABLE page_views ADD IF NOT EXISTS ${partitionsToAdd.join('\n')};`);
103-
}
104-
}
105-
106-
if (partitionsAdded) {
107-
partitions = await getPartitions(athenaWrapper);
108-
}
109-
110-
return partitions;
111-
});
112-
}

src/src/backend/api-front/server.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { initTRPC, TRPCError } from '@trpc/server';
22
import { getFrontendEnvironment } from '@backend/api-front/routes/env';
3-
import { sites, sitesGetPartitions, sitesUpdatePartition } from '@backend/api-front/routes/sites';
3+
import { sites } from '@backend/api-front/routes/sites';
44
import {
55
getTopLevelStats,
66
getPageViews,
@@ -31,8 +31,6 @@ export type TrpcInstance = typeof trpcInstance;
3131
export const appRouter = trpcInstance.router({
3232
getFrontendEnvironment: getFrontendEnvironment(trpcInstance),
3333
sites: sites(trpcInstance),
34-
sitesGetPartitions: sitesGetPartitions(trpcInstance),
35-
sitesUpdatePartition: sitesUpdatePartition(trpcInstance),
3634
getTopLevelStats: getTopLevelStats(trpcInstance),
3735
getPageViews: getPageViews(trpcInstance),
3836
getChartViews: getChartViews(trpcInstance),

src/src/backend/lib/models/site.ts

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,3 @@ import { z } from 'zod';
99

1010
export const SchemaSite = z.string();
1111
export type Site = z.infer<typeof SchemaSite>;
12-
13-
export const SchemaSitePartition = z.object({
14-
site: SchemaSite,
15-
year: z.number(),
16-
month: z.number(),
17-
});
18-
export const SchemaSitePartitions = z.array(SchemaSitePartition);
19-
export type SitePartition = z.infer<typeof SchemaSitePartition>;
20-
export type SitePartitions = z.infer<typeof SchemaSitePartitions>;

src/src/frontend/src/views/page_stats/index.vue

Lines changed: 2 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import { useDark } from '@vueuse/core';
33
import {computed, onMounted, Ref, ref, unref, watch} from "vue";
44
import {api, apiWrapper} from "@frontend/src/lib/api";
5-
import {SitePartition} from "@backend/lib/models/site";
65
import Totals from "@frontend/src/views/page_stats/components/totals.vue";
76
import ChartViews from "@frontend/src/views/page_stats/components/chart_views.vue";
87
import {DateUtils} from "@frontend/src/lib/date_utils";
@@ -14,25 +13,10 @@ import UTM from "@frontend/src/views/page_stats/components/utm.vue";
1413
import assert from "assert";
1514
1615
/* ================================================================================================================== */
17-
/* ============================================= Settings & Partitions ============================================== */
16+
/* ==================================================== Settings =================================================== */
1817
/* ================================================================================================================== */
1918
const isDark = useDark();
2019
const showSettings = ref(false);
21-
let partitions: Ref<SitePartition[]> = ref([]);
22-
let loadingPartitions = ref(false);
23-
onMounted(async () => {
24-
const resp = await apiWrapper(api.sitesGetPartitions.query(), loadingPartitions);
25-
if(!resp)
26-
return;
27-
partitions.value = resp;
28-
});
29-
async function refreshPartitions(forceRepair: boolean)
30-
{
31-
const resp = await apiWrapper(api.sitesUpdatePartition.mutate({forceRepair}), loadingPartitions);
32-
if(!resp)
33-
return;
34-
partitions.value = resp;
35-
}
3620
3721
/* ================================================================================================================== */
3822
/* ================================================== Date Filter =================================================== */
@@ -135,7 +119,7 @@ const loadingReferrers = ref(false);
135119
const loadingUserInfo = ref(false);
136120
const loadingUtm = ref(false);
137121
const loadingComponents = computed(() => {
138-
return loadingSites.value || loadingPartitions.value || loadingTotals.value || loadingPageViews.value ||
122+
return loadingSites.value || loadingTotals.value || loadingPageViews.value ||
139123
loadingChartViews.value || loadingChartLocations.value || loadingUserInfo.value || loadingUtm.value;
140124
});
141125
@@ -214,12 +198,6 @@ async function refresh()
214198

215199
<el-divider direction="vertical" style="height: 1.5rem; top: -3px;"></el-divider>
216200

217-
<el-tooltip content="Refresh partitions">
218-
<el-button class="menu-button" text round plain @click="refreshPartitions(false)" :disabled="loadingPartitions">
219-
<mdi-inbox-multiple class="menu-button__icon"></mdi-inbox-multiple>
220-
</el-button>
221-
</el-tooltip>
222-
223201
<el-tooltip content="Settings">
224202
<el-button class="menu-button" text round plain @click="showSettings = !showSettings" >
225203
<mdi-cog class="menu-button__icon"></mdi-cog>
@@ -270,50 +248,10 @@ async function refresh()
270248
</div>
271249

272250
<el-drawer class="hidden-sm-and-down" v-model="showSettings" title="Settings" direction="rtl" size="100%" style="max-width: 500px" >
273-
274251
<div class="settings-label-single" style="display: flex; justify-content: space-between">
275252
<span class="settings-label">Theme</span>
276253
<el-switch class="header__switch" v-model="isDark" inactive-text="Light" active-text="Dark" />
277254
</div>
278-
279-
<el-collapse>
280-
<el-collapse-item>
281-
<template #title>
282-
<span class="settings-label">
283-
Partitions ({{partitions.length}})
284-
<el-button-group>
285-
<el-button style="margin-left: 100px;" plain type="primary"
286-
@click.stop="refreshPartitions(false)" :loading="loadingPartitions">
287-
Refresh
288-
</el-button>
289-
<el-popover popper-style="padding: 3px;" trigger="hover" placement="bottom-end">
290-
<template #reference>
291-
<el-button plain type="primary" style="padding-left: 5px; padding-right: 5px;"
292-
@click.stop="" :disabled="loadingPartitions">
293-
<mdi-chevron-down></mdi-chevron-down>
294-
</el-button>
295-
</template>
296-
<template #default style="padding: 0">
297-
<el-button plain type="primary" text
298-
@click.stop="refreshPartitions(true)" :loading="loadingPartitions">
299-
Force Repair
300-
</el-button>
301-
</template>
302-
</el-popover>
303-
304-
</el-button-group>
305-
</span>
306-
</template>
307-
308-
<el-table :data="partitions" style="width: 100%" stripe v-loading="loadingPartitions">
309-
<el-table-column prop="site" label="Site" sortable />
310-
<el-table-column prop="year" label="Year" width="100" sortable />
311-
<el-table-column prop="month" label="Month" width="100" sortable />
312-
</el-table>
313-
314-
</el-collapse-item>
315-
</el-collapse>
316-
317255
</el-drawer>
318256

319257
</el-container>

0 commit comments

Comments
 (0)