Skip to content

Commit f8a8c89

Browse files
authored
Service release checklist and minor readme improvements (#7)
1 parent c8b0e64 commit f8a8c89

File tree

3 files changed

+109
-16
lines changed

3 files changed

+109
-16
lines changed

LICENSE

+1-12
Original file line numberDiff line numberDiff line change
@@ -175,18 +175,7 @@
175175

176176
END OF TERMS AND CONDITIONS
177177

178-
APPENDIX: How to apply the Apache License to your work.
179-
180-
To apply the Apache License to your work, attach the following
181-
boilerplate notice, with the fields enclosed by brackets "[]"
182-
replaced with your own identifying information. (Don't include
183-
the brackets!) The text should be enclosed in the appropriate
184-
comment syntax for the file format. We also recommend that a
185-
file or class name and description of purpose be included on the
186-
same "printed page" as the copyright notice for easier
187-
identification within third-party archives.
188-
189-
Copyright [yyyy] [name of copyright owner]
178+
Copyright 2023 Lokalise
190179

191180
Licensed under the Apache License, Version 2.0 (the "License");
192181
you may not use this file except in compliance with the License.

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66

77
It comes with the following out-of-the-box:
88

9-
* fastify-based general application skeleton;
9+
* [fastify](https://www.fastify.io/docs/latest/) as a basis for the general web application skeleton;
1010
* Modular, domain-driven structure that encourages separation of concerns;
1111
* Server/app separation, for convenient bootstrapping in e2e tests;
1212
* [Global error handler](./src/infrastructure/errors/errorHandler.ts);
1313
* JSON-based, single line standardized [logging](./src/infrastructure/logger.ts);
14-
* Populates `req.id` for incoming requests based on `x-request-id` header, or generates new UUID if none is set.
14+
* Automatic population of `req.id` for incoming requests based on `x-request-id` header, or generation of new UUID if none is set, for the purposes of distributed tracing.
1515

1616
Mechanisms:
1717

@@ -102,8 +102,8 @@ npm run db:update-client
102102
docker compose up -d
103103
```
104104

105-
6. To register local dev requests in newrelic please use:
105+
6. To run application:
106106

107107
```shell
108-
NEW_RELIC_APP_NAME=dev.yourapp.yourdomain.com NEW_RELIC_LICENSE_KEY=<license_key> npm run start:dev
108+
npm run start:dev
109109
```

docs/service-release-checklist.md

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Service readiness checklist
2+
3+
This document aims to provide a checklist for determining whether service is mature enough for general availability.
4+
5+
## Checklist
6+
7+
Legend: `M` - Mandatory, `R` - Recommended
8+
9+
### Documentation
10+
11+
- `M`: README.md in the repository:
12+
* service description, ownership, links to documentation,
13+
build and deployment instructions, configuration variables, healthchecks, etc
14+
- `R`: Architectural diagram for the service
15+
- `R`: SLI/SLO/SLA definitions (at least service criticality level: P1/2/3)
16+
17+
### Build and release pipeline
18+
19+
- `M`: GitHub Actions worklows for building the code, running linting check and executing tests
20+
- `M`: Artifacts are container images
21+
22+
### Infrastructure
23+
24+
- `M`: All persistent state (if any) is stored in external storage
25+
- `M`: All components must be designed for HA (e.g. an app should be able to run as
26+
multiple instances in active/active configuration)
27+
- `M`: Deployment diagram
28+
- `M`:
29+
- traffic types (HTTP/gRPC/other)
30+
- spikes / static IP address
31+
- adv L7 feat.: waf/auth/sticky sessions/request routing/load balancing algs
32+
- CORS requirements
33+
34+
### Security
35+
36+
- `M`: No secrets in the code
37+
- `M`: Any housekeeping/one-off/migration/etc tasks must be part of the
38+
application; `stage` and `live` environment are not accessible directly.
39+
- `M`: Externally exposed services must require authentication
40+
- `M`: Documentation must have answers for the following questions:
41+
* Is this internal or external service?
42+
* Does the service make any outbound connections? If yes, specify destinations.
43+
* Does the service handle personally identifiable information?
44+
- `R`: HTTP headers / CORS:
45+
* `X-Frame-Options`, `Strict-Transport-Security`, `X-XSS-Protection`,
46+
`X-DNS-Prefetch-Control`
47+
48+
### Operations
49+
50+
- `M`: Application configuration is set via environment variables
51+
- `M`: Logging satisfies the following requirements:
52+
* single-line json to stdout/stderr, `message` or `msg` field at the root level
53+
* make sure data types for json fields aren't mixed otherwise parsing will not work
54+
* at least two verbosity levels: debug/error;
55+
* `error`: unexpected error that prevents further processing
56+
* `warn` : irregular events with defined recovery strategy
57+
* `info` : major state changes; must log: component start, became operational,
58+
event/task processed, shutdown started, just before exited
59+
* `debug`: diagnostic and troubleshooting event
60+
* global error handler; make sure all errors are logged
61+
* error response structure adheres to defined standard
62+
* field `level` must contain a string (error, warn) and not a number
63+
* distributed tracing:
64+
* request id is passed via `x-request-id` header, must propagate if received, otherwise generate a new one
65+
* request id must be included with the request-scoped logging and outgoing HTTP requests
66+
- `M`: Implements APM integration
67+
- `M`: Healthcheck endpoint; should provide:
68+
* at a minimum: 200 response if the service is operational, non-200 response code otherwise
69+
* include app version and commit hash in the response
70+
* recommended: [readiness and liveness endpoints]
71+
- `M`: Implements metrics:
72+
* 4 golden signals: latency/traffic/errors/saturation
73+
* endpoint (preferably `/metrics`) in Prometheus format on a separate port (eg `9090`)
74+
* availability, authentication status, and latency for all backend services
75+
* Node.js metrics
76+
* business metrics as necessary/defined by the service owner
77+
- `R`: Perform simple load testing of the service, use the results for:
78+
* sizing the live infrastructure; eg cores, RAM, storage size
79+
* define alerting thresholds; eg: 4 golden signals, latency/traffic(req cnt)/error/saturation
80+
81+
### Resiliency
82+
83+
- `M`: The service can run in multiple instances simultaneously
84+
- `M`: Must handle component unavailability gracefully (eg. unable to connect to storage):
85+
- all connections must have reasonable timeouts and error handling
86+
- all HTTP calls must have reasonable timeouts and error handling
87+
- reconnect with exponential back-off where necessary
88+
- `M`: [Graceful shutdown] on SIGTERM (15): stop accepting connections, complete in-flight work, exit
89+
90+
## References
91+
92+
- https://www.opslevel.com/blog/production-readiness-in-depth#deployment
93+
- https://gruntwork.io/devops-checklist/
94+
- https://aleksei-kornev.medium.com/production-readiness-checklist-for-backend-applications-8d2b0c57ccec
95+
- https://github.com/mercari/production-readiness-checklist/blob/master/docs/references/pre-production-checklist.md
96+
- https://blog.last9.io/deployment-readiness-checklists/
97+
- https://habr.com/en/post/438186/
98+
- https://12factor.net
99+
- https://cloud.google.com/blog/products/containers-kubernetes/your-guide-kubernetes-best-practices
100+
101+
[readiness and liveness endpoints]: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes
102+
[Graceful shutdown]: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace
103+
104+
This document is based on a Lokalise Service Release Checklist, prepared by the Lokalise Platform Squad.

0 commit comments

Comments
 (0)