Releases: giantswarm/prometheus-rules
Releases · giantswarm/prometheus-rules
v4.54.1
Fixed
- Fix
MonitoringAgentDown
to not page for non deleting clusters.
v4.54.0
Changed
- Label all our alerts with the giantswarm tenant.
- Get rid of the alloy rules app as it will now be managed by the observability operator.
v4.53.0
Added
- Add new alert to detect missing installation logs that relates to teleport access.
Changed
- Fine tune the
MetricForwardingErrors
so it does not trigger on sporadic issues like duplicate samples (e.g. when a pod restarts too frequently for a small time window). This alert is now based on the upstream alert and uses a percentage of failed remote storage samples as described in this issue giantswarm/giantswarm#32873
v4.52.0
Changed
- Reduce management cluster resource usage alert window from 2d to 30m.
Fixed
- Make sure HelmReleaseFailed for onprem clusters pages our onprem team.
v4.51.0
Changed
- Increased the threshold time for
ManagementClusterWebhookDurationExceedsTimeout
from 15m to 25m - Set
WorkloadClusterNodeUnexpectedTaintNodeCAPIUninitialized
to page - Cancel
WorkloadClusterEtcdNumberOfLeaderChangesTooHigh
during cluster upgrades, creation and deletion - Tweaked the time and size of the
KubeletVolumeSpaceTooLow
alerts. - Change the
KubeletVolumeSpaceTooLow
for <500Mb available to page instead of notify - Tweaked the time and size of the
DockerVolumeSpaceTooLow
alerts. - Change the
DockerVolumeSpaceTooLow
for <1Gb available to page instead of notify
v4.50.0
Changed
- Changed the severity of several Team Tenet alerts to be "notify"
v4.49.3
Changed
- Increase threshold time for
KubeStateMetricsSlow
from 7s to 15s.
v4.49.2
Changed
- Fixed some grafana-cloud recording rules to specifically use metrics giantswarm metrics
- Update
PromtailRequestsErrors
to fire after 1h instead of 25min. - Update
PromtailRequestsErrors
to cancel outside of working hours.
v4.49.1
Changed
- Update
MimirDataPushFailures
runbook url.
v4.49.0
Changed
- Rename
MimirObjectStoreLowRate
toMimirDataPushFailures
and update its expression to only targetupload
operations from theingester
component.