Skip to content

Cross Cluster Health Checks #474

Open
@philbrookes

Description

@philbrookes

What
Currently the DNS Operator provides health checks that can be used to check the health of on-cluster workloads with the intention to use this to allow the DNS Operator to act on a publishing strategy and decide whether or not to remove unhealthy workload records from the zone.

We would like to extend this capability to allow a cluster to create health check probes for the other clusters discovered via the zone entries.

How
Extend the current DNS Health Check Probes to add a field declaring that it is intended as a cross-cluster health check. When a cross cluster health check is failing, it will be written to the configured registry and to metrics.

Disaster recovery (i.e. a cluster unexpectedly vanishes)
An alert can be configured with the above metrics to notify an admin to remove a particular owner from the zone manually as it's cluster has been unhealthy.

related work

Future work
These failing cross cluster checks can ultimately be exposed to the publishing strategy to allow CEL based logic to be used to decide whether to remove another cluster's records.

Metadata

Metadata

Assignees

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions