Skip to content

Commit 0099e6e

Browse files
authored
feat: pixel tracking and docs #32 (#63)
1 parent 36fb6f0 commit 0099e6e

File tree

11 files changed

+321
-18
lines changed

11 files changed

+321
-18
lines changed

README.md

Lines changed: 95 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,13 @@ are:
176176

177177
See the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client) for more options.
178178

179+
Beacon/pixel tracking can be used as alternative to HTML attribute tracking. Beacon tracking is useful for
180+
tracking events outside your domain, like email opens, external blog views, etc.
181+
See the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client) for more info.
182+
```html
183+
<img src="<YOUR BACKEND ORIGIN>/api-ingest/v1/event/track/beacon.gif?site=<SITE>&event=<EVENT>" height="1" width="1" alt="">
184+
```
185+
179186
#### SDK Client Usage
180187

181188
Install the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client):
@@ -190,8 +197,9 @@ Irrelevant of the framework, you have to do the following to track page views on
190197
site's `Origin` is whitelisted in the backend config.
191198
2. On each route change call the `analyticsPageChange` function with the name of the new page.
192199

193-
The following sections show you how to do it in Vue, see [the readme of the client](https://github.com/rehanvdm/serverless-website-analytics-client#usage)
194-
for React and Svelte usage, but again the SDK allows for usage in **ANY framework**.
200+
> [!IMPORTANT]
201+
> The `serverless-website-analytics` can be used in **ANY framework**. To demonstrate this, find examples for Svelte and React in the
202+
> [_client project_](https://github.com/rehanvdm/serverless-website-analytics-client#usage)
195203
196204
#### Vue
197205

@@ -206,7 +214,7 @@ app.use(router);
206214
swaClient.v1.analyticsPageInit({
207215
inBrowser: true, //Not SSR
208216
site: "<Friendly site name>", //example.com
209-
apiUrl: "<Your serverless-website-analytics URL>", //https://my-serverless-website-analytics-backend.com
217+
apiUrl: "<YOUR BACKEND ORIGIN>", //https://my-serverless-website-analytics-backend.com
210218
// debug: true,
211219
});
212220
router.afterEach((event) => {
@@ -228,8 +236,7 @@ import {swaClient} from "./main";
228236
swaClient.v1.analyticsTrack("subscribe", 1, "clicks")
229237
```
230238

231-
The `serverless-website-analytics` **any framework**. To demonstrate this, find examples for Svelte and React in the
232-
[_client project_](https://github.com/rehanvdm/serverless-website-analytics-client/tree/master/usage)
239+
Alternatively, you can use a beacon/pixel for tracking as described above in standalone import script usage.
233240

234241
## Worst case projected costs
235242

@@ -266,8 +273,10 @@ AWS CloudFront is used to host the frontend. The frontend is a Vue 3 SPA app tha
266273
CloudFront. The [Element UI Plus](https://element-plus.org/en-US/) frontend framework is used for the UI components
267274
and [Plotly.js](https://plotly.com/javascript/) for the charts.
268275

269-
![frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_1.png)
270-
![frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_2.png)
276+
![2_frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_1.png)
277+
![2_frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_2.png)
278+
279+
![2_frontend_3.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_3.png)
271280

272281
### Backend
273282

@@ -286,7 +295,8 @@ There are three available authentication configurations:
286295

287296
Similarly to the backend, it is also a TS Lambda-lith that is hit through the FURL by reverse proxying through CloudFront.
288297
It also uses [tRPC](https://trpc.io/) but uses the [trpc-openapi](https://github.com/jlalmes/trpc-openapi) package to
289-
generate an OpenAPI spec. This is used to generate the API types used in the [client JS package](https://www.npmjs.com/package/serverless-website-analytics-client).
298+
generate an [OpenAPI spec](https://github.com/rehanvdm/serverless-website-analytics-client/blob/master/package/src/OpenAPI-Ingest.yaml).
299+
This is used to generate the API types used in the [client JS package](https://www.npmjs.com/package/serverless-website-analytics-client).
290300
and can also be used to generate other language client libraries.
291301

292302
The lambda function then saves the data to S3 through a Kinesis Firehose. The Firehose is configured to save the data
@@ -296,6 +306,83 @@ the date will be stored after about 1min ± 1min.
296306
Location data is obtained by looking the IP address up in the [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geoip2/geolite2/) database.
297307
We don't store any Personally Identifiable Information (PII) in the logs or S3, the IP address is never stored.
298308

309+
### Querying data manually
310+
311+
You can query the data manually using Athena. The data is partitioned by site and date. There are two tables,
312+
one for the page views (`page_views`) and another for the tracking data(`events`).
313+
314+
Pages view query:
315+
```sql
316+
WITH
317+
cte_data AS (
318+
SELECT site, page_url, time_on_page, page_opened_at,
319+
ROW_NUMBER() OVER (PARTITION BY page_id ORDER BY time_on_page DESC) rn
320+
FROM page_views
321+
WHERE (site = 'site1' site = 'site2') AND (page_opened_at_date = '2023-10-26' OR page_opened_at_date = '2023-10-27')
322+
),
323+
cte_data_filtered AS (
324+
SELECT *
325+
FROM cte_data
326+
WHERE rn = 1 AND page_opened_at BETWEEN parse_datetime('2023-10-26 22:00:00.000','yyyy-MM-dd HH:mm:ss.SSS')
327+
AND parse_datetime('2023-11-03 21:59:59.999','yyyy-MM-dd HH:mm:ss.SSS')
328+
),
329+
cte_data_by_page_view AS (
330+
SELECT
331+
site,
332+
page_url,
333+
COUNT(*) as "views",
334+
ROUND(AVG(time_on_page),2) as "avg_time_on_page"
335+
FROM cte_data_filtered
336+
GROUP BY site, page_url
337+
)
338+
SELECT *
339+
FROM cte_data_by_page_view
340+
ORDER BY views DESC, page_url ASC
341+
```
342+
343+
Events query:
344+
```sql
345+
WITH
346+
cte_data AS (
347+
SELECT site, category, event, data, tracked_at,
348+
ROW_NUMBER() OVER (PARTITION BY event_id) rn
349+
FROM events
350+
WHERE (site = 'site1' site = 'site2') AND (tracked_at_date = '2023-11-03' OR tracked_at_date = '2023-11-04')
351+
),
352+
cte_data_filtered AS (
353+
SELECT *
354+
FROM cte_data
355+
WHERE rn = 1 AND tracked_at BETWEEN parse_datetime('2023-11-03 22:00:00.000','yyyy-MM-dd HH:mm:ss.SSS')
356+
AND parse_datetime('2023-11-04 21:59:59.999','yyyy-MM-dd HH:mm:ss.SSS')
357+
),
358+
cte_data_by_event AS (
359+
SELECT
360+
site,
361+
category,
362+
event,
363+
COUNT(data) as "count",
364+
ROUND(AVG(data),2) as "avg",
365+
MIN(data) as "min",
366+
MAX(data) as "max",
367+
SUM(data) as "sum"
368+
FROM cte_data_filtered
369+
GROUP BY site, category, event
370+
)
371+
SELECT *
372+
FROM cte_data_by_event
373+
ORDER BY count DESC, category ASC, event ASC
374+
```
375+
376+
A few things to note:
377+
- The first CTE query is used to get the latest page view/event for each page/event, but it is only in the second query
378+
where we select the top row of that query.
379+
- The first query specifies the partitions, the site and dates. The dates can be specified with a range query, but
380+
it is more performant to specify the exact partitions.
381+
- The second query along with selecting the latest row frm the first, specifies the date range exactly, taking into
382+
consideration the time zone. Within the code we over fetch the data to be returned by 2 days, this is to ensure that
383+
this secondary query has the data the specific time query that takes into consideration the zone.
384+
- The third query does the aggregation and the last one the ordering.
385+
299386
## Upgrading
300387

301388
### From V0 to V1

docs/API.md

Lines changed: 95 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,13 @@ are:
176176

177177
See the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client) for more options.
178178

179+
Beacon/pixel tracking can be used as alternative to HTML attribute tracking. Beacon tracking is useful for
180+
tracking events outside your domain, like email opens, external blog views, etc.
181+
See the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client) for more info.
182+
```html
183+
<img src="<YOUR BACKEND ORIGIN>/api-ingest/v1/event/track/beacon.gif?site=<SITE>&event=<EVENT>" height="1" width="1" alt="">
184+
```
185+
179186
#### SDK Client Usage
180187

181188
Install the [client-side library](https://www.npmjs.com/package/serverless-website-analytics-client):
@@ -190,8 +197,9 @@ Irrelevant of the framework, you have to do the following to track page views on
190197
site's `Origin` is whitelisted in the backend config.
191198
2. On each route change call the `analyticsPageChange` function with the name of the new page.
192199

193-
The following sections show you how to do it in Vue, see [the readme of the client](https://github.com/rehanvdm/serverless-website-analytics-client#usage)
194-
for React and Svelte usage, but again the SDK allows for usage in **ANY framework**.
200+
> [!IMPORTANT]
201+
> The `serverless-website-analytics` can be used in **ANY framework**. To demonstrate this, find examples for Svelte and React in the
202+
> [_client project_](https://github.com/rehanvdm/serverless-website-analytics-client#usage)
195203
196204
#### Vue
197205

@@ -206,7 +214,7 @@ app.use(router);
206214
swaClient.v1.analyticsPageInit({
207215
inBrowser: true, //Not SSR
208216
site: "<Friendly site name>", //example.com
209-
apiUrl: "<Your serverless-website-analytics URL>", //https://my-serverless-website-analytics-backend.com
217+
apiUrl: "<YOUR BACKEND ORIGIN>", //https://my-serverless-website-analytics-backend.com
210218
// debug: true,
211219
});
212220
router.afterEach((event) => {
@@ -228,8 +236,7 @@ import {swaClient} from "./main";
228236
swaClient.v1.analyticsTrack("subscribe", 1, "clicks")
229237
```
230238

231-
The `serverless-website-analytics` **any framework**. To demonstrate this, find examples for Svelte and React in the
232-
[_client project_](https://github.com/rehanvdm/serverless-website-analytics-client/tree/master/usage)
239+
Alternatively, you can use a beacon/pixel for tracking as described above in standalone import script usage.
233240

234241
## Worst case projected costs
235242

@@ -266,8 +273,10 @@ AWS CloudFront is used to host the frontend. The frontend is a Vue 3 SPA app tha
266273
CloudFront. The [Element UI Plus](https://element-plus.org/en-US/) frontend framework is used for the UI components
267274
and [Plotly.js](https://plotly.com/javascript/) for the charts.
268275

269-
![frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_1.png)
270-
![frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_2.png)
276+
![2_frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_1.png)
277+
![2_frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_2.png)
278+
279+
![2_frontend_3.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/2_frontend_3.png)
271280

272281
### Backend
273282

@@ -286,7 +295,8 @@ There are three available authentication configurations:
286295

287296
Similarly to the backend, it is also a TS Lambda-lith that is hit through the FURL by reverse proxying through CloudFront.
288297
It also uses [tRPC](https://trpc.io/) but uses the [trpc-openapi](https://github.com/jlalmes/trpc-openapi) package to
289-
generate an OpenAPI spec. This is used to generate the API types used in the [client JS package](https://www.npmjs.com/package/serverless-website-analytics-client).
298+
generate an [OpenAPI spec](https://github.com/rehanvdm/serverless-website-analytics-client/blob/master/package/src/OpenAPI-Ingest.yaml).
299+
This is used to generate the API types used in the [client JS package](https://www.npmjs.com/package/serverless-website-analytics-client).
290300
and can also be used to generate other language client libraries.
291301

292302
The lambda function then saves the data to S3 through a Kinesis Firehose. The Firehose is configured to save the data
@@ -296,6 +306,83 @@ the date will be stored after about 1min ± 1min.
296306
Location data is obtained by looking the IP address up in the [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geoip2/geolite2/) database.
297307
We don't store any Personally Identifiable Information (PII) in the logs or S3, the IP address is never stored.
298308

309+
### Querying data manually
310+
311+
You can query the data manually using Athena. The data is partitioned by site and date. There are two tables,
312+
one for the page views (`page_views`) and another for the tracking data(`events`).
313+
314+
Pages view query:
315+
```sql
316+
WITH
317+
cte_data AS (
318+
SELECT site, page_url, time_on_page, page_opened_at,
319+
ROW_NUMBER() OVER (PARTITION BY page_id ORDER BY time_on_page DESC) rn
320+
FROM page_views
321+
WHERE (site = 'site1' site = 'site2') AND (page_opened_at_date = '2023-10-26' OR page_opened_at_date = '2023-10-27')
322+
),
323+
cte_data_filtered AS (
324+
SELECT *
325+
FROM cte_data
326+
WHERE rn = 1 AND page_opened_at BETWEEN parse_datetime('2023-10-26 22:00:00.000','yyyy-MM-dd HH:mm:ss.SSS')
327+
AND parse_datetime('2023-11-03 21:59:59.999','yyyy-MM-dd HH:mm:ss.SSS')
328+
),
329+
cte_data_by_page_view AS (
330+
SELECT
331+
site,
332+
page_url,
333+
COUNT(*) as "views",
334+
ROUND(AVG(time_on_page),2) as "avg_time_on_page"
335+
FROM cte_data_filtered
336+
GROUP BY site, page_url
337+
)
338+
SELECT *
339+
FROM cte_data_by_page_view
340+
ORDER BY views DESC, page_url ASC
341+
```
342+
343+
Events query:
344+
```sql
345+
WITH
346+
cte_data AS (
347+
SELECT site, category, event, data, tracked_at,
348+
ROW_NUMBER() OVER (PARTITION BY event_id) rn
349+
FROM events
350+
WHERE (site = 'site1' site = 'site2') AND (tracked_at_date = '2023-11-03' OR tracked_at_date = '2023-11-04')
351+
),
352+
cte_data_filtered AS (
353+
SELECT *
354+
FROM cte_data
355+
WHERE rn = 1 AND tracked_at BETWEEN parse_datetime('2023-11-03 22:00:00.000','yyyy-MM-dd HH:mm:ss.SSS')
356+
AND parse_datetime('2023-11-04 21:59:59.999','yyyy-MM-dd HH:mm:ss.SSS')
357+
),
358+
cte_data_by_event AS (
359+
SELECT
360+
site,
361+
category,
362+
event,
363+
COUNT(data) as "count",
364+
ROUND(AVG(data),2) as "avg",
365+
MIN(data) as "min",
366+
MAX(data) as "max",
367+
SUM(data) as "sum"
368+
FROM cte_data_filtered
369+
GROUP BY site, category, event
370+
)
371+
SELECT *
372+
FROM cte_data_by_event
373+
ORDER BY count DESC, category ASC, event ASC
374+
```
375+
376+
A few things to note:
377+
- The first CTE query is used to get the latest page view/event for each page/event, but it is only in the second query
378+
where we select the top row of that query.
379+
- The first query specifies the partitions, the site and dates. The dates can be specified with a range query, but
380+
it is more performant to specify the exact partitions.
381+
- The second query along with selecting the latest row frm the first, specifies the date range exactly, taking into
382+
consideration the time zone. Within the code we over fetch the data to be returned by 2 days, this is to ensure that
383+
this secondary query has the data the specific time query that takes into consideration the zone.
384+
- The third query does the aggregation and the last one the ordering.
385+
299386
## Upgrading
300387

301388
### From V0 to V1

docs/imgs/2_frontend_1.png

148 KB
Loading

docs/imgs/2_frontend_2.png

129 KB
Loading

docs/imgs/2_frontend_3.png

72.4 KB
Loading

docs/imgs/frontend_1.png

-131 KB
Binary file not shown.

docs/imgs/frontend_2.png

-179 KB
Binary file not shown.

docs/imgs/frontend_3.png

-45.4 KB
Binary file not shown.

src/src/backend/api-ingest/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ import { DateUtils } from '@backend/lib/utils/date_utils';
1212
import { TRPCError } from '@trpc/server';
1313
import assert from 'assert';
1414
import { removeCloudFrontProxyPath, TRPCHandlerError } from '@backend/lib/utils/api_utils';
15+
import { v1EventTrackBeaconGif } from '@backend/api-ingest/v1/event/track';
1516

1617
/* Lazy loaded variables */
1718
let openApiDocument: OpenAPIV3.Document | undefined;
@@ -115,6 +116,8 @@ export const handler = async (event: APIGatewayProxyEventV2, context: Context):
115116

116117
if (event.rawPath === '/docs') {
117118
response = docsRoute();
119+
} else if (event.rawPath === '/v1/event/track/beacon.gif') {
120+
response = await v1EventTrackBeaconGif(event);
118121
} else {
119122
if (!corsValidOrigin) {
120123
logger.error('Invalid origin:', event.headers.origin);

0 commit comments

Comments
 (0)