-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Steps to reproduce the problem:
- Deploy Harbor v2.14.0 via a harbor-helm v1.18.0 Helm chart & enable Harbor Prometheus metrics
- Intentionally make some of the Harbor component(s) not to run (fail to start), for example by.:
- For the portal component intentionally provide an invalid Nginx config directive to harbor-portal ConfigMap, diff example below:

- For the registry component intentionally provide some invalid option in the harbor-registry ConfigMap, diff example below:

- Restart both the harbor-portal and harbor-registry pods to ensure, they load their new configs & end up with CrashLoopBackOff error
- Now issue harbor_up() Prometheus query (either in Prometheus UI, or by defining a new Prometheus alert, using this metric)
- Check the reported container, pod, and service names
Actual behaviour:
The portal, registry, and registryctl components are correctly reported as failing ones.
But a wrong / misleading information is reported in container, pod, and service fields (namely "exporter" is reported as container, "harbor-exporter-.*" as pod, and "harbor-exporter" as service name for all components)
See output below (the instance IP & namespace were intentionally changed):
Load time: 307ms □~@~B Result series: 8
harbor_up{component="core", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 1
harbor_up{component="database", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 1
harbor_up{component="jobservice", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 1
harbor_up{component="portal", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 0
harbor_up{component="redis", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 1
harbor_up{component="registry", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 0
harbor_up{component="registryctl", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 0
harbor_up{component="trivy", container="exporter", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-exporter-5f6b484c6-rpp7h", service="harbor-exporter"} 1
Note: When you look to the harbor-exporter pod, there's no container like 'portal', or 'registry', or 'registryctl' present there.
Expected behavior:
- For the failing portal component, existing failing harbor-portal-7c49df4b68-qphzg pod is reported in pod field, portal in the container field, and harbor-portal in the service field. In other words for every failing component, the corresponding / real pod, container, and service name is reported for that component, instead of the exporter "placeholder" used currently.
- Analogous for the failing registry component, "registry" is reported as container, real registry pod name, and "harbor-registry" reported as service name.
If I should try to adjust the aforementioned (current metric output) to the proposed one (so it would better reflect to the situation in the K8s namespace), it would look as follows (note the changed container, pod, and service field values):
Load time: 307ms □~@~B Result series: 8
harbor_up{component="core", container="core", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-core-5b4f678f4d-tgpt8", service="harbor-core"} 1
harbor_up{component="database", container="database", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-database-0", service="harbor-database"} 1
harbor_up{component="jobservice", container="jobservice", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-jobservice-65cbd58bbd-pnjq5", service="harbor-jobservice"} 1
harbor_up{component="portal", container="portal", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-portal-7c49df4b68-qphzg", service="harbor-portal"} 0
harbor_up{component="redis", container="redis", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-redis-0", service="harbor-redis"} 1
harbor_up{component="registry", container="registry", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-registry-5d479956bb-nq9jm", service="harbor-registry"} 0
harbor_up{component="registryctl", container="registryctl", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-registry-5d479956bb-nq9jm", service="harbor-registry"} 0
harbor_up{component="trivy", container="trivy", endpoint="http-metrics", instance="<<redacted>>", job="harbor", namespace="<<redacted>>", pod="harbor-trivy-0", service="harbor-trivy"} 1
Versions:
Please specify the versions of following systems.
- harbor version: [2.14.0]
- harbor-helm version: [1.18.0]
Additional context:
Suppose you want to define Prometheus alerting rules for Harbor, and for each of the failing rules you want to provide as much as possible / detailed information, about the failing component.
When e.g. the portal or registry Harbor components are failing, for example the kube_pod_container_status_restarts_total() metric correctly reports the "harbor-portal-7c49df4b68-qphzg" and "harbor-registry-5d479956bb-nq9jm" as the failing pods, so you can point the users to look into that pod logs to investigate the reasons of the failure further.
But in the very same scenario, the harbor_up() metric reports the aforementioned "exporter" as container, pod, and service names, so it's not possible to direct users to the actual pod, container, or service to investigate further.
- Harbor config files: Not needed, see pictures above for sample/desired harbor-portal & harbor-registry ConfigMap modifications.
- Log files: Not relevant here either. The issue is in the outputs, metric reports.