Skip to content

Releases: giantswarm/prometheus-rules

v4.61.0

14 May 07:58
e599a85
Compare
Choose a tag to compare

Changed

  • Add grafana-postgresql in the ObservabilityStorageSpaceTooLow alert's monitored PVCs.

Added

  • Add GrafanaPostgresqlReplicationFailure and GrafanaPostgresqlArchivingFailure alerting rules in grafana.rules.yml.
  • Vintage cleanup:
    • Removed code behind obvious vintage/capi conditions in Cabbage rules.
    • Removed code behind obvious vintage/capi conditions in Honeybadger rules.
    • Removed code behind obvious vintage/capi conditions in Tenet rules.
    • Removed code behind obvious vintage/capi conditions in Shield rules.
    • Remove the "aws" provider.
    • Clean up mimir-heartbeat type that was needed when we have both old and new heartbeats.
  • Add CAPATooManyReconciliations to page when CAPA controllers are stuck reconciling over and over.

v4.60.0

13 May 08:54
7255419
Compare
Choose a tag to compare

Changed

Added

  • Add OnPremCloudProviderAPIIsDown alert to all clusters

  • Vintage cleanup:

    • Stopped running tests for vintage. Meaning some vintage-specific labels had to be removed.
    • Removed code behind obvious vintage/capi conditions in Atlas rules.
    • Removed code behind obvious vintage/capi conditions in Phoenix rules.

v4.59.2

09 May 13:53
2a490d2
Compare
Choose a tag to compare

Changed

  • Improved AlloyUnhealthyComponents alert by adding pod name

v4.59.1

09 May 10:31
e19603a
Compare
Choose a tag to compare

Changed

  • LogForwardingErrors: don't page out of business hours

v4.59.0

08 May 12:46
6a46629
Compare
Choose a tag to compare

Added

  • Add new alert KonfigureOperatorDeploymentNotSatisfied: when konfigure-operator deployment in giantswarm namespace is not ready for 30 mins.
  • Add new alert KonfigurationReconciliationFailed: when a ManagementClusterConfiguration CR in not Ready condition for 10 mins.

v4.58.0

07 May 13:17
b9205cf
Compare
Choose a tag to compare

Changed

  • DeploymentNotSatisfiedAtlas: lower sensitivity and page only during business hours

Added

  • Add new alert ClusterUpgradeStuck to detect if the cluster app cannot be upgraded.

v4.57.0

30 Apr 08:06
53f42bf
Compare
Choose a tag to compare

Changed

  • PromtailRequestsErrors does not fire anymore when alloy-logs is running

Added

  • Added PromtailConflictsWithAlloy alert

v4.56.1

28 Apr 09:26
64369b9
Compare
Choose a tag to compare

Changed

  • Reenabled storage alerts LogVolumeSpaceTooLow and RootVolumeSpaceTooLow as paging during working hours until we have node problem detector deployed.

Fixed

  • Fix SLOs recording rules sent to Grafana Cloud that sometimes trigger PrometheusRulesFailure due to the origin metric pod changing.

v4.56.0

24 Apr 11:41
2661b9b
Compare
Choose a tag to compare

Changed

  • Improved ClusterAutoscalerFailedScaling alert to detect stuck states where cluster-autoscaler fails to scale.

v4.55.0

22 Apr 12:27
e384f98
Compare
Choose a tag to compare

Changed

  • Improve ClusterCrossplaneResourcesNotReady with new metrics where available
  • Improve alert for Karpenter machines not being Ready

Fixed

  • Use exported_namespace for certificate expiration alerts.

Removed

  • Remove alerts related to alloy-rules.