Description
What
Currently the DNS Operator provides health checks that can be used to check the health of on-cluster workloads with the intention to use this to allow the DNS Operator to act on a publishing strategy and decide whether or not to remove unhealthy workload records from the zone.
We would like to extend this capability to allow a cluster to create health check probes for the other clusters discovered via the zone entries.
How
Extend the current DNS Health Check Probes to add a field declaring that it is intended as a cross-cluster health check. When a cross cluster health check is failing, it will be written to the configured registry and to metrics.
Disaster recovery (i.e. a cluster unexpectedly vanishes)
An alert can be configured with the above metrics to notify an admin to remove a particular owner from the zone manually as it's cluster has been unhealthy.
related work
- Remove an owner from a zone: kuadrant plugin command: kuadrant dns delete-owner #343
- publish strategy: DNS Policy unpublish strategy kuadrant-operator#1403
Future work
These failing cross cluster checks can ultimately be exposed to the publishing strategy to allow CEL based logic to be used to decide whether to remove another cluster's records.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status