Skip to content

etcd_server_healthcheck not available until after readyz endpoint has been hit once #20741

@AndrewJackson2020

Description

@AndrewJackson2020

Bug report criteria

What happened?

It appears that the etcd_server_healthcheck fields in the /metrics endpoint are not available until after the readyz endpoint has been hit.

What did you expect to happen?

The behaviour that I expect is shown in the below code block. I would expect the fields in etcd_server_healthcheck to still show up but with zero count.

> cd `mktemp -d`

> etcd  --log-level fatal &

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  156k    0  156k    0     0  27.0M      0 --:--:-- --:--:-- --:--:-- 30.6M# HELP etcd_server_healthcheck The result of each kind of healthcheck.
# TYPE etcd_server_healthcheck gauge
etcd_server_healthcheck{name="data_corruption",type="readyz"} 0
etcd_server_healthcheck{name="linearizable_read",type="readyz"} 0
etcd_server_healthcheck{name="non_learner",type="readyz"} 0
etcd_server_healthcheck{name="serializable_read",type="readyz"} 0
# HELP etcd_server_healthchecks_total The total number of each kind of healthcheck.
# TYPE etcd_server_healthchecks_total counter
etcd_server_healthchecks_total{name="data_corruption",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="linearizable_read",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="non_learner",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="serializable_read",status="success",type="readyz"} 0

> curl http://localhost:2379/readyz
ok

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP etcd_server_healthcheck The result of each kind of healthcheck.
# TYPE etcd_server_healthcheck gauge
etcd_server_healthcheck{name="data_corruption",type="readyz"} 1
etcd_server_healthcheck{name="linearizable_read",type="readyz"} 1
etcd_server_healthcheck{name="non_learner",type="readyz"} 1
etcd_server_healthcheck{name="serializable_read",type="readyz"} 1
# HELP etcd_server_healthchecks_total The total number of each kind of healthcheck.
# TYPE etcd_server_healthchecks_total counter
etcd_server_healthchecks_total{name="data_corruption",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="linearizable_read",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="non_learner",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="serializable_read",status="success",type="readyz"} 1

> pkill etcd

> etcd  --log-level fatal &

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  156k    0  156k    0     0  27.0M      0 --:--:-- --:--:-- --:--:-- 30.6M# HELP etcd_server_healthcheck The result of each kind of healthcheck.
# TYPE etcd_server_healthcheck gauge
etcd_server_healthcheck{name="data_corruption",type="readyz"} 0
etcd_server_healthcheck{name="linearizable_read",type="readyz"} 0
etcd_server_healthcheck{name="non_learner",type="readyz"} 0
etcd_server_healthcheck{name="serializable_read",type="readyz"} 0
# HELP etcd_server_healthchecks_total The total number of each kind of healthcheck.
# TYPE etcd_server_healthchecks_total counter
etcd_server_healthchecks_total{name="data_corruption",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="linearizable_read",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="non_learner",status="success",type="readyz"} 0
etcd_server_healthchecks_total{name="serializable_read",status="success",type="readyz"} 0

How can we reproduce it (as minimally and precisely as possible)?

See below for script that I used to repro this issue.

> cd `mktemp -d`

> etcd  --log-level fatal &

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  156k    0  156k    0     0  27.0M      0 --:--:-- --:--:-- --:--:-- 30.6M

> curl http://localhost:2379/readyz
ok

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP etcd_server_healthcheck The result of each kind of healthcheck.
# TYPE etcd_server_healthcheck gauge
etcd_server_healthcheck{name="data_corruption",type="readyz"} 1
etcd_server_healthcheck{name="linearizable_read",type="readyz"} 1
etcd_server_healthcheck{name="non_learner",type="readyz"} 1
etcd_server_healthcheck{name="serializable_read",type="readyz"} 1
# HELP etcd_server_healthchecks_total The total number of each kind of healthcheck.
# TYPE etcd_server_healthchecks_total counter
etcd_server_healthchecks_total{name="data_corruption",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="linearizable_read",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="non_learner",status="success",type="readyz"} 1
etcd_server_healthchecks_total{name="serializable_read",status="success",type="readyz"} 1

> pkill etcd

> etcd  --log-level fatal &

> curl http://localhost:2379/metrics | grep 'etcd_server_healthcheck'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  156k    0  156k    0     0  27.0M      0 --:--:-- --:--:-- --:--:-- 30.6M

Anything else we need to know?

No response

Etcd version (please run commands below)

 > etcd --version
etcd Version: 3.6.5
Git SHA: a061450
Go Version: go1.24.7
Go OS/Arch: linux/amd64

 > etcdctl version
etcdctl version: 3.6.5
API version: 3.6

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions