Open
Description
To better understand the behavior of the application, data points should be collected and visualized. Besides node statistics, statistics of the individual applications such as Nginx, Nchan, PHP-FPM, RabbitMQ, Redis and MySQL should also be collected. In addition, there are statistics related to this app, such as replication lag and event store pointer for each event store follower (replication lag based on the pointer value), number of running and open games, user signups and arrivals and alike.
This issue requires several iterations:
- Set up the infrastructure and show a Grafana dashboard with at least one of the following items.
Set up monitoring infrastructure #144 - Show node dashboard.
Set up monitoring infrastructure #144
Optimize node dashboard #148 - Show MySQL dashboard.
Add MySQL dashboard #146 - Show RabbitMQ dashboard. With that, RabbitMQ's web ui can be disabled.
Add RabbitMQ dashboard #147 - Show Redis dashboard.
Add Redis dashboard #152 - Show Traefik dashboard. With that, Traefik's web ui can be disabled.
Add Traefik dashboard #151 - Show Nchan dashboard.
Add Nchan dashboard #149 - Show PHP-FPM dashboard (maybe via
artprima/prometheus-metrics-bundle
).
Add PHP-FPM dashboard #168 - Show other relevant application metrics like number of running games, number of written chat messages or event store follower replication lag. Each domain should get its own dashboard.
This can be done with the Prometheus Pushgateway, with the Statsd Exporter or other similar solutions deployed as a sidecar. - Mention Grafana and Prometheus in
README.md
.
Mention Grafana and Prometheus #156