Skip to content

Commit 34e3c02

Browse files
authored
feat: add cost analysis #18 (#22)
1 parent df66db1 commit 34e3c02

File tree

4 files changed

+117
-2
lines changed

4 files changed

+117
-2
lines changed

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ You can see a LIVE DEMO [HERE](https://demo.serverless-website-analytics.com/) a
1111
[here](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/DEMO-TRAFFIC.md)
1212

1313
## Objectives
14+
- Multi site
1415
- Privacy focused, don't store any Personally Identifiable Information (PII).
1516
- Low frequency of dashboard views
1617
- The target audience is small to medium website(s) with low to moderate page view traffic (equal or less than 10M views)
@@ -19,6 +20,7 @@ You can see a LIVE DEMO [HERE](https://demo.serverless-website-analytics.com/) a
1920
- No direct server-side state
2021
- Low maintenance
2122
- Easy to deploy in your AWS account, any *region
23+
- Pay for what you use (scale to 0)
2224

2325
The main objective is to keep it simple and the operational cost low, keeping true to "scale to 0" tenants of serverless,
2426
even if it goes against "best practices".
@@ -176,6 +178,20 @@ app.mount('#app');
176178

177179
..Any other framework
178180

181+
## Worst case projected costs
182+
183+
**SEE THE FULL COST BREAKDOWN AND SPREAD SHEET > [HERE](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/COST.md)**
184+
185+
The worst case projected costs are:
186+
187+
| Views | Cost($) |
188+
|-------------|----------|
189+
| 10,000 | 2.01 |
190+
| 100,000 | 3.24 |
191+
| 1,000,000 | 14.64 |
192+
| 10,000,000 | 128.74 |
193+
| 100,000,000 | 1,288.39 |
194+
179195
## What's in the box
180196

181197
The architecture consists of four components: frontend, backend, ingestion API and the client JS library.
@@ -195,7 +211,6 @@ and [Plotly.js](https://plotly.com/javascript/) for the charts.
195211

196212
![frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_1.png)
197213
![frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_2.png)
198-
![frontend_3.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_3.png)
199214

200215
### Backend
201216

docs/API.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ You can see a LIVE DEMO [HERE](https://demo.serverless-website-analytics.com/) a
1111
[here](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/DEMO-TRAFFIC.md)
1212

1313
## Objectives
14+
- Multi site
1415
- Privacy focused, don't store any Personally Identifiable Information (PII).
1516
- Low frequency of dashboard views
1617
- The target audience is small to medium website(s) with low to moderate page view traffic (equal or less than 10M views)
@@ -19,6 +20,7 @@ You can see a LIVE DEMO [HERE](https://demo.serverless-website-analytics.com/) a
1920
- No direct server-side state
2021
- Low maintenance
2122
- Easy to deploy in your AWS account, any *region
23+
- Pay for what you use (scale to 0)
2224

2325
The main objective is to keep it simple and the operational cost low, keeping true to "scale to 0" tenants of serverless,
2426
even if it goes against "best practices".
@@ -176,6 +178,20 @@ app.mount('#app');
176178

177179
..Any other framework
178180

181+
## Worst case projected costs
182+
183+
**SEE THE FULL COST BREAKDOWN AND SPREAD SHEET > [HERE](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/COST.md)**
184+
185+
The worst case projected costs are:
186+
187+
| Views | Cost($) |
188+
|-------------|----------|
189+
| 10,000 | 2.01 |
190+
| 100,000 | 3.24 |
191+
| 1,000,000 | 14.64 |
192+
| 10,000,000 | 128.74 |
193+
| 100,000,000 | 1,288.39 |
194+
179195
## What's in the box
180196

181197
The architecture consists of four components: frontend, backend, ingestion API and the client JS library.
@@ -195,7 +211,6 @@ and [Plotly.js](https://plotly.com/javascript/) for the charts.
195211

196212
![frontend_1.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_1.png)
197213
![frontend_2.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_2.png)
198-
![frontend_3.png](https://github.com/rehanvdm/serverless-website-analytics/blob/main/docs/imgs/frontend_3.png)
199214

200215
### Backend
201216

docs/COST.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Cost
2+
3+
## Current WORST case projected cost
4+
5+
A **worst case cost analysis** is done in this [Google Sheet](https://docs.google.com/spreadsheets/d/1-UFCtBt8HJ0CY7iVGGKzJ6jKGM9GugE6sAfwspobOI8/edit#gid=0)
6+
7+
![img.png](imgs/cost-calculator.png)
8+
9+
_To use this sheet, ONLY change the values in the yellow cells, these are the hyperparameters that you control. The rest of
10+
the values are calculated based on these hyperparameters or are constants._
11+
12+
With the hyperparameters set to:
13+
- 1 site
14+
- 15 minutes of firehose buffer interval
15+
- 200 dashboard views per month
16+
17+
The worst case projected costs are:
18+
19+
| Views | Cost($) |
20+
|-------------|----------|
21+
| 10,000 | 2.01 |
22+
| 100,000 | 3.24 |
23+
| 1,000,000 | 14.64 |
24+
| 10,000,000 | 128.74 |
25+
| 100,000,000 | 1,288.39 |
26+
27+
## Cost breakdown
28+
29+
The majority of the costs come from the Athena S3 reads. We are optimizing for this by using Kinesis Firehose and setting
30+
the buffer as long as possible (15 minutes). The downside of this is that the data is not available in near real-time, but
31+
delayed by 15 ±1 minutes. This is a trade-off that we are willing to make, as you do not really need this information in
32+
real-time.
33+
34+
You can adjust the firehose buffer interval to be shorter, up to 1 minute, but this will increase the cost. On low volume
35+
sites this might be worth it, but on high volume sites it will be too expensive.
36+
37+
The number of sites also influences the cost, because each site is a multiplier to the number of partitions written. In
38+
other words if we have 100 views and 1 site, then all 100 views are written into 1 partition. If we have 100 views and 2
39+
sites, assuming an equal split in views, then 50 views are written into 1 partition and 50 views are written into another,
40+
resulting in 2 partitions.
41+
42+
## Improvements
43+
44+
Currently, we are partitioning by month. This was because we were adding partitions manually and didn't want to burden the
45+
user with having to do this through the frontend on a daily basis. Since we switched to dynamic partitioning, we can change this to be
46+
daily. This will result in more partitions, but not more S3 files written because the firehose buffer interval is still
47+
less than a day. With this change we will limit the amount of data scanned by Athena significantly, which will reduce the
48+
cost. Given the assumption that on average, you do not query the full month's data, but only today or a few days, this will be a
49+
significant cost reduction.
50+
51+
We will also create janitors/background workers that run daily to optimize the partitions. This will use the CTAS query
52+
of Athena to optimize the multiple daily Firehose parquet files into just a few files. This will also see a big reduction
53+
in the amount of data scanned by Athena, which will reduce the cost.
54+
55+
## FAQ
56+
57+
### Why use the Lambda Function URL and Cloudfront?
58+
59+
CloudFront + the Function URL is the cheapest method of exposing your Lambda function to the internet, while giving some form of
60+
control. By control, we mean that you give it a custom domain name and add WAF to it for security purposes, should we
61+
ever want to protect against DDOS attacks. We would have reverse proxied through CloudFront anyway to prevent CORS, so
62+
adding two API gateways would have been more expensive.
63+
64+
Let's consider the cost for ingestion of 1 million views per month:
65+
- REST API GW: $3.50
66+
- HTTP API GW: $1.00
67+
- Cloudfront: $1.00 to $1.20
68+
69+
The pricing of HTTP API GW and CloudFront requests are actually very similar. In hindsight, we could have used HTTP API GW
70+
but because we are already using CloudFront for all our domains, it just makes things easier.
71+
72+
### Why not use SQS or EventBridge?
73+
74+
Having a `ingest => buffer => process => store` pipeline is nice, but it just adds extra costs. Instead, we made the
75+
assumption that if you exceed the rate limit, that something is anomalous and that instead of trying to ingest the data
76+
we just drop it (lambda concurrency). Website analytics is not mission-critical data, your wallet is. When the rate
77+
limit is exceeded, you will get an alarm, and you can choose to increase the rate limit if it is legitimate traffic.
78+
79+
Also, SQS adds $0.40 per million and Event Bridge adds $1.00 per million.
80+
81+
### Why not use DynamoDB?
82+
83+
DynamoDB (DDB) is the only other serverless database that we could potentially use to store the data and keep true to
84+
our objectives. However, DDB is not a good fit for this use case because it is not optimized for analytics. It would be
85+
much more expensive to do lookups on DDB than S3.

docs/imgs/cost-calculator.png

248 KB
Loading

0 commit comments

Comments
 (0)