Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PrometheusMissingRuleEvaluations runbook #45

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions content/runbooks/prometheus/PrometheusMissingRuleEvaluations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# PrometheusMissingRuleEvaluations

## Meaning

Alert fires when prometheus rule_group evaluation takes consistently longer than rule_group interval.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved

## Impact

Rule groups have either alerts or recording rules. If prometheus can not evaluate rules in time - it might fail to trigger alert.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved

## Diagnosis

Quick checks:
- Check if enough resources allocated to promeheus.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved
- Check if there are no bad neighbors that consume too much CPU.

Deep dive:
- Use `prometheus_rule_group_iterations_missed_total` metric to identify strugling rule_group.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved

## Mitigation

Quick fixes:
- Increase CPU resources allocation to prometheus.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved
- Movebad neighbor to different host.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved

Deep dive:
- Increase rule evaluate interval.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved
- Splitup up rule_group into smaller groups if rules do not depend on each other. It should help because rules inside a group are evaluated in sequence.
aisbaa marked this conversation as resolved.
Show resolved Hide resolved