Skip to content

Application Health Monitoring

Dani Donisa edited this page Jul 31, 2020 · 44 revisions

About

We collect metrics about the usage of OBS, such as logins of users, creation of packages and projects and alike.

The monitoring dashboards are hosted at https://obs-measure.opensuse.org/. You can login with your GitHub account and should get the Editor role. The openSUSE RabbitMQ is running at https://rabbit.opensuse.org/.

Application Health Overview Dashboard

This dashboard gives a general overview about the health status of the application. You could say if the application is up or not by looking at the following panels:

Number of successful requests per minute

This panel tracks the application traffic from the application's point of view.

Number of successful requests per minute ⚠️ It will send an alert when the traffic is too low.

Error rates

This panel tracks requests with an http status error code.

Error rates

Authentication Failures

This panel monitor burst of authentication failures within 10 minutes.

Authentication Failures

Request State Change

This panel tracks request creation and request state changes.

Request State Change

Projects / min

This panel tracks projects destroyed and created within a minute.

Projects per minute

Packages / min

This panel tracks packages destroyed and created.

Packages per minute

Total projects

This panel tracks the total amount of projects that were created and destroyed.

Total projects

Total packages

This panel tracks the total amount of packages that were created and destroyed.

Total packages

User Creation

This panel tracks the total amount of users that were created within an hour.

User Creation

Beta Users

This panel tracks the total amount of users who joined and left the beta program.

Beta Users

Detailed Errors Dashboard

This dashboard gives a detailed picture of the errors happening in the application. Each type of error has its own panel:

Architectural overview

Our AHM stack consists of:

RabbitMQ

Metrics we collect are sent to RabbitMQ to the metrics queue.

Telegraf

Telegraf fetches these metrics and reports them to InfluxDB.

InfluxDB

InfluxDB stores the time series data we collect (database telegraf).

Grafana

Grafana is used to create graphs to visualize the collected data.

Development setup

Instructions for setting up the development environment for AHM can be found in our docker documentation

Configure Grafana

Go to Grafana frontend, http://localhost:8000, and login (admin/admin).

Add a new data source by adding following data:

Type: InfluxDB
URL: http://influx:8086
Database: telegraf
User: grafana
Password: grafana

Clone this wiki locally