Skip to content

Application Health Monitoring

Dani Donisa edited this page Jul 30, 2020 · 44 revisions

We are going to collect metrics about the usage of OBS, such as logins of users, creation of packages and projects and alike. Since we are planning to refresh the OBS UI and might change and improve workflows in OBS, we want to be able to track if that has any negative, or positive, effects.

Architectural overview

Our AHM stack consists of:

RabbitMQ

Metrics we collect are sent to RabbitMQ to the metrics queue.

Telegraf

Telegraf fetches these metrics and reports them to InfluxDB.

InfluxDB

InfluxDB stores the time series data we collect (database telegraf).

Grafana

Grafana is used to create graphs to visualize the collected data.

Development setup

Instructions for setting up the development environment for AHM can be found in our docker documentation

Prepare and start the container

rake docker:ahm:prepare
docker-compose -f docker-compose.ahm.yml -f docker-compose.yml up

Configure Grafana

Go to Grafana frontend, http://localhost:8000, and login (admin/admin).

Add a new data source by adding following data:

Type: InfluxDB
URL: http://influx:8086
Database: telegraf
User: grafana
Password: grafana

Production setup

The Grafana dashboards are hosted at https://obs-measure.opensuse.org/. You can login with your GitHub account and should get the Editor role. The openSUSE RabbitMQ is running at https://rabbit.opensuse.org/.

Main Dashboards

Application Health Overview

Located here, this dashboard should tell if the application is up or not.

Main error panel

Perf overview

response time

health

Alerts

There is an alert for the number of successful requests per minute that will trigger when the total number of queries drops below 5 per minute during two full minutes.

Clone this wiki locally