-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Internal infrastructure includes a st2monitoring server with the dashboard and client checks (services, memory, processes, ports) for each internal infra node including st2cicd server, as well as external checks (APIs, SSL cert expiry, Domains, ST2 websites availability health checks).
In order to reduce the amount of infra, costs, moving pieces, and relying less on AWS resources (see https://github.com/orgs/StackStorm/projects/27), remove the st2monitoring server and start migrating to free 3rd party service for monitoring and alerting.
For example, we could use Scalyr (where @Kami works).
There are several sub-tasks here:
- research the monitoring/alerting platform (if Scalyr is good)
- create and configure 3rd party monitoring st2 TSC account
- shared account/email with the TSC
- monitoring alerts should go to #opstown Slack
- setup external API/web checks:
- APIs
- SSL + expiry
- Domains + expiry
- Health checks:
- stackstorm.com
- stackstorm.org
- index.stackstorm.org
- helm.stackstorm.com
- api.stackstorm.com
- docs.stackstorm.com
- st2cicd webhook endpoints
- create internal checks for st2cicd:
- via 3rd party monitoring agent/client
- migrate st2cicd internal checks: memory, CPU, services, processes, etc, etc
Finishing the first part with migrating the external checks would be already great. We can remove the monitoring at that point which would save us $60/mo in AWS.

