Skip to content

[metrics] harbor_up() metric: Wrong / misleading container, pod, and service reported (exporter vs real component name) #22463

@iankko

Description

@iankko

Steps to reproduce the problem:

  1. Deploy Harbor v2.14.0 via a harbor-helm v1.18.0 Helm chart & enable Harbor Prometheus metrics
  2. Intentionally make some of the Harbor component(s) not to run (fail to start), for example by.:
  • For the portal component intentionally provide an invalid Nginx config directive to harbor-portal ConfigMap, diff example below:
Image
  • For the registry component intentionally provide some invalid option in the harbor-registry ConfigMap, diff example below:
Image
  1. Restart both the harbor-portal and harbor-registry pods to ensure, they load their new configs & end up with CrashLoopBackOff error
  2. Now issue harbor_up() Prometheus query (either in Prometheus UI, or by defining a new Prometheus alert, using this metric)
  3. Check the reported container, pod, and service names

Actual behaviour:
The portal, registry, and registryctl components are correctly reported as failing ones.

But a wrong / misleading information is reported in container, pod, and service fields (namely "exporter" is reported as container, "harbor-exporter-.*" as pod, and "harbor-exporter" as service name for all components)

See output below (the instance IP & namespace were intentionally changed):

Load time: 307ms □~@~B Result series: 8

harbor_up{component="core", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}   1
harbor_up{component="database", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}       1
harbor_up{component="jobservice", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}     1
harbor_up{component="portal", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 0
harbor_up{component="redis", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}  1
harbor_up{component="registry", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}       0
harbor_up{component="registryctl", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}    0
harbor_up{component="trivy", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"}  1

Note: When you look to the harbor-exporter pod, there's no container like 'portal', or 'registry', or 'registryctl' present there.

Expected behavior:

  1. For the failing portal component, existing failing harbor-portal-7c49df4b68-qphzg pod is reported in pod field, portal in the container field, and harbor-portal in the service field. In other words for every failing component, the corresponding / real pod, container, and service name is reported for that component, instead of the exporter "placeholder" used currently.
  2. Analogous for the failing registry component, "registry" is reported as container, real registry pod name, and "harbor-registry" reported as service name.

If I should try to adjust the aforementioned (current metric output) to the proposed one (so it would better reflect to the situation in the K8s namespace), it would look as follows (note the changed container, pod, and service field values):

Load time: 307ms □~@~B Result series: 8

harbor_up{component="core", container="core", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-core-5b4f678f4d-tgpt8", service="harbor-core"}      1
harbor_up{component="database", container="database", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-database-0", service="harbor-database"}     1
harbor_up{component="jobservice", container="jobservice", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-jobservice-65cbd58bbd-pnjq5", service="harbor-jobservice"}      1
harbor_up{component="portal", container="portal", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-portal-7c49df4b68-qphzg", service="harbor-portal"}      0
harbor_up{component="redis", container="redis", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-redis-0", service="harbor-redis"} 1
harbor_up{component="registry", container="registry", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-registry-5d479956bb-nq9jm", service="harbor-registry"}      0
harbor_up{component="registryctl", container="registryctl", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-registry-5d479956bb-nq9jm", service="harbor-registry"}        0
harbor_up{component="trivy", container="trivy", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-trivy-0", service="harbor-trivy"} 1

Versions:
Please specify the versions of following systems.

  • harbor version: [2.14.0]
  • harbor-helm version: [1.18.0]

Additional context:
Suppose you want to define Prometheus alerting rules for Harbor, and for each of the failing rules you want to provide as much as possible / detailed information, about the failing component.

When e.g. the portal or registry Harbor components are failing, for example the kube_pod_container_status_restarts_total() metric correctly reports the "harbor-portal-7c49df4b68-qphzg" and "harbor-registry-5d479956bb-nq9jm" as the failing pods, so you can point the users to look into that pod logs to investigate the reasons of the failure further.

But in the very same scenario, the harbor_up() metric reports the aforementioned "exporter" as container, pod, and service names, so it's not possible to direct users to the actual pod, container, or service to investigate further.

  • Harbor config files: Not needed, see pictures above for sample/desired harbor-portal & harbor-registry ConfigMap modifications.
  • Log files: Not relevant here either. The issue is in the outputs, metric reports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions