Skip to content

Releases: giantswarm/prometheus-rules

v4.54.1

08 Apr 14:16
a63fbc7
Compare
Choose a tag to compare

Fixed

  • Fix MonitoringAgentDown to not page for non deleting clusters.

v4.54.0

07 Apr 08:25
7d46a33
Compare
Choose a tag to compare

Changed

  • Label all our alerts with the giantswarm tenant.
  • Get rid of the alloy rules app as it will now be managed by the observability operator.

v4.53.0

02 Apr 13:19
2e34bc1
Compare
Choose a tag to compare

Added

  • Add new alert to detect missing installation logs that relates to teleport access.

Changed

  • Fine tune the MetricForwardingErrors so it does not trigger on sporadic issues like duplicate samples (e.g. when a pod restarts too frequently for a small time window). This alert is now based on the upstream alert and uses a percentage of failed remote storage samples as described in this issue giantswarm/giantswarm#32873

v4.52.0

26 Mar 14:19
d33b4bf
Compare
Choose a tag to compare

Changed

  • Reduce management cluster resource usage alert window from 2d to 30m.

Fixed

  • Make sure HelmReleaseFailed for onprem clusters pages our onprem team.

v4.51.0

25 Mar 11:21
45b37d0
Compare
Choose a tag to compare

Changed

  • Increased the threshold time for ManagementClusterWebhookDurationExceedsTimeout from 15m to 25m
  • Set WorkloadClusterNodeUnexpectedTaintNodeCAPIUninitialized to page
  • Cancel WorkloadClusterEtcdNumberOfLeaderChangesTooHigh during cluster upgrades, creation and deletion
  • Tweaked the time and size of the KubeletVolumeSpaceTooLow alerts.
  • Change the KubeletVolumeSpaceTooLow for <500Mb available to page instead of notify
  • Tweaked the time and size of the DockerVolumeSpaceTooLow alerts.
  • Change the DockerVolumeSpaceTooLow for <1Gb available to page instead of notify

v4.50.0

18 Mar 14:22
cdd8964
Compare
Choose a tag to compare

Changed

  • Changed the severity of several Team Tenet alerts to be "notify"

v4.49.3

17 Mar 14:26
5f9758b
Compare
Choose a tag to compare

Changed

  • Increase threshold time for KubeStateMetricsSlow from 7s to 15s.

v4.49.2

14 Mar 07:37
9491cf5
Compare
Choose a tag to compare

Changed

  • Fixed some grafana-cloud recording rules to specifically use metrics giantswarm metrics
  • Update PromtailRequestsErrors to fire after 1h instead of 25min.
  • Update PromtailRequestsErrors to cancel outside of working hours.

v4.49.1

12 Mar 17:16
df36a91
Compare
Choose a tag to compare

Changed

  • Update MimirDataPushFailures runbook url.

v4.49.0

12 Mar 13:52
f3dc98e
Compare
Choose a tag to compare

Changed

  • Rename MimirObjectStoreLowRate to MimirDataPushFailures and update its expression to only target upload operations from the ingester component.