-
Notifications
You must be signed in to change notification settings - Fork 461
Application Health Monitoring
We are going to collect metrics about the usage of OBS, such as logins of users, creation of packages and projects and alike. Since we are planning to refresh the OBS UI and might change and improve workflows in OBS, we want to be able to track if that has any negative, or positive, effects.
Our AHM stack consists of:
Metrics we collect are sent to RabbitMQ to the metrics queue.
Telegraf fetches these metrics and reports them to InfluxDB.
InfluxDB stores the time series data we collect (database telegraf).
Grafana is used to create graphs to visualize the collected data.
Instructions for setting up the development environment for AHM can be found in our docker documentation
rake docker:ahm:prepare
docker-compose -f docker-compose.ahm.yml -f docker-compose.yml upGo to Grafana frontend, http://localhost:8000, and login (admin/admin).
Add a new data source by adding following data:
Type: InfluxDB
URL: http://influx:8086
Database: telegraf
User: grafana
Password: grafana
The Grafana dashboards are hosted at https://obs-measure.opensuse.org/. You can login with your GitHub account and should get the Editor role.
The openSUSE RabbitMQ is running at https://rabbit.opensuse.org/.
This dashboard gives a general overview about the health status of the application. You could say if the application is up or not by looking at the following panels:
- Number of successful requests per minute. :warning: It will send an alert when the traffic is too low.
- Error rates tracks requests with an http status error code.
- Authentication Failures monitor burst of authentication failures within 10 minutes.
- Request State Change tracks request creation and request state changes.
- Projects / min tracks projects destroyed and created within a minute.
- Packages / min tracks packages destroyed and created.
- Total project tracks the total amount of projects that were created and destroyed.
- Total package tracks the total amount of packages that were created and destroyed.
- User Creation tracks the total amount of users that were created within an hour.
- Beta Users tracks the total amount of users who joined and left the beta program.
This dashboard gives a detailed picture of the errors happening in the application. Each type of error has its own panel:
-
500 (Internal server error):
⚠️ It will send an alert when there are more than 10 errors per minute during 2 minutes. - 400 (Bad Request)
- 401 (Unauthorized)
- 403 (Forbidden)
- 404 (Not found) / min
- 408 (Request Timeout)
- 422 (Unprocessable Entity)
This panel has a selector to choose which interface show data from: webui or api.
Then the first four panels show:
- Response time: Mean of requests response time for the selected interface.
- SQL Time: Mean of queries perform time for the selected interface.
- View Time: Mean of views rendering time for the selected interface.
- Total requests: Total amount of requests performed for the selected interface.
Below that:
- Response time: Track controller response time of any kind of action/request performed in the selected interface. Giving three values per response: max, min and mean.
- Database: Track database response time of any kind of request performed in the selected interface. Giving three values per response: max, min and mean.
- View: Track views rendering time of any kind of view rendered in the selected interface. Giving three values per response: max, min and mean.
- Requests: List of the 20 most time-consuming requests. Displays controller and action names, as well as the associated Request ID and its maximum response time.
- Actions: List of all controllers' actions with their corresponding response time (mean, median, and max) and the number of times they were called.
- SQL: List of all performed SQL queries with their corresponding response time (mean, median, and max) and the number of times they were called.
- Templates: List of all rendered templates/views with their corresponding response time (mean, median, and max) and the number of times they were called.
- Backend: List of all the backend calls with their corresponding response time (mean, median, and max) and the number of times they were called.
- Development Environment Overview
- Development Environment Tips & Tricks
- Spec-Tips
- Code Style
- Rubocop
- Testing with VCR
- Test in kanku
- Authentication
- Authorization
- Autocomplete
- BS Requests
- Events
- ProjectLog
- Notifications
- Feature Toggles
- Build Results
- Attrib classes
- Flags
- The BackendPackage Cache
- Maintenance classes
- Cloud uploader
- Delayed Jobs
- Staging Workflow
- StatusHistory
- OBS API
- Owner Search
- Search
- Links
- Distributions
- Repository
- Data Migrations
- Package Versions
- next_rails
- Ruby Update
- Rails Profiling
- Remote Pairing Setup Guide
- Factory Dashboard
- osc
- Setup an OBS Development Environment on macOS
- Run OpenQA smoketest locally
- Responsive Guidelines
- Importing database dumps
- Problem Statement & Solution
- Kickoff New Stuff
- New Swagger API doc
- Documentation and Communication
- GitHub Actions
- Brakeman
- How to Introduce Software Design Patterns
- Query Objects
- Services
- View Components
- RFC: Core Components
- RFC: Decorator Pattern
- RFC: Backend models
- RFC: Hotwire Turbo Frames Pattern