Skip to content

Application Health Monitoring

Dani Donisa edited this page Jul 30, 2020 · 44 revisions

We are going to collect metrics about the usage of OBS, such as logins of users, creation of packages and projects and alike. Since we are planning to refresh the OBS UI and might change and improve workflows in OBS, we want to be able to track if that has any negative, or positive, effects.

Architectural overview

Our AHM stack consists of:

RabbitMQ

Metrics we collect are sent to RabbitMQ to the metrics queue.

Telegraf

Telegraf fetches these metrics and reports them to InfluxDB.

InfluxDB

InfluxDB stores the time series data we collect (database telegraf).

Grafana

Grafana is used to create graphs to visualize the collected data.

Development setup

Instructions for setting up the development environment for AHM can be found in our docker documentation

Prepare and start the container

rake docker:ahm:prepare
docker-compose -f docker-compose.ahm.yml -f docker-compose.yml up

Configure Grafana

Go to Grafana frontend, http://localhost:8000, and login (admin/admin).

Add a new data source by adding following data:

Type: InfluxDB
URL: http://influx:8086
Database: telegraf
User: grafana
Password: grafana

Production setup

The Grafana dashboards are hosted at https://obs-measure.opensuse.org/. You can login with your GitHub account and should get the Editor role. The openSUSE RabbitMQ is running at https://rabbit.opensuse.org/.

Health Dashboards

Overview

This dashboard gives a general overview about the health status of the application. You could say if the application is up or not by looking at the following panels:

Detailed error panels

This dashboard gives a detailed picture of the errors happening in the application. Each type of error has its own panel:

Performance Dashboards

response time

health

Clone this wiki locally