Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd-viz: Bring your own Prometheus #13313

Closed
jack1902 opened this issue Nov 12, 2024 · 2 comments
Closed

linkerd-viz: Bring your own Prometheus #13313

jack1902 opened this issue Nov 12, 2024 · 2 comments
Labels

Comments

@jack1902
Copy link

jack1902 commented Nov 12, 2024

What is the issue?

When bringing my own prometheus to linkerd-viz so that the data persists across restarts and for longer periods of time, i encounter issues whereby linkerd viz routes does nothing

How can it be reproduced?

Deploy linkerd-viz such that prometheus.enabled=false and prometheusUrl points to your deployment of prometheus (i have mine inside the linkerd-viz namespace.

I have deployed prometheus using the prometheus-community/prometheus helm-chart with the below values most of which are either sourced from the documentation https://linkerd.io/2-edge/tasks/external-prometheus/ or the configmap itself. Into the linkerd-viz namespace:

server:
  podAnnotations:
    linkerd.io/inject: enabled
  global:
    ## How frequently to scrape targets by default
    ##
    scrape_interval: 10s
    ## How long until a scrape request times out
    ##
    scrape_timeout: 10s
    ## How frequently to evaluate rules
    ##
    evaluation_interval: 10s

  service:
    # We want to maintain the containerPort onto the service too else it causes weirdness inside linkerd-proxy
    # When sending traffic from `80` -> `9090` it doesn't work
    servicePort: 9090

  persistentVolume:
    size: 20Gi

# Originally had this under `server` but it should be under `serverFiles`
serverFiles:
  prometheus.yml:
    scrape_configs:
    - job_name: 'linkerd-controller'
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
          - 'linkerd'
          - 'linkerd-viz'
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: admin-http
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: component

    - job_name: 'linkerd-service-mirror'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_label_linkerd_io_control_plane_component
        - __meta_kubernetes_pod_container_port_name
        action: keep
        regex: linkerd-service-mirror;admin-http$
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: component

    - job_name: 'linkerd-proxy'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_container_port_name
        - __meta_kubernetes_pod_label_linkerd_io_control_plane_ns
        action: keep
        regex: ^linkerd-proxy;linkerd-admin;linkerd$
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      # special case k8s' "job" label, to not interfere with prometheus' "job"
      # label
      # __meta_kubernetes_pod_label_linkerd_io_proxy_job=foo =>
      # k8s_job=foo
      - source_labels: [__meta_kubernetes_pod_label_linkerd_io_proxy_job]
        action: replace
        target_label: k8s_job
      # drop __meta_kubernetes_pod_label_linkerd_io_proxy_job
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_job
      # __meta_kubernetes_pod_label_linkerd_io_proxy_deployment=foo =>
      # deployment=foo
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      # drop all labels that we just made copies of in the previous labelmap
      - action: labeldrop
        regex: __meta_kubernetes_pod_label_linkerd_io_proxy_(.+)
      # __meta_kubernetes_pod_label_linkerd_io_foo=bar =>
      # foo=bar
      - action: labelmap
        regex: __meta_kubernetes_pod_label_linkerd_io_(.+)
      # Copy all pod labels to tmp labels
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
        replacement: __tmp_pod_label_$1
      # Take `linkerd_io_` prefixed labels and copy them without the prefix
      - action: labelmap
        regex: __tmp_pod_label_linkerd_io_(.+)
        replacement:  __tmp_pod_label_$1
      # Drop the `linkerd_io_` originals
      - action: labeldrop
        regex: __tmp_pod_label_linkerd_io_(.+)
      # Copy tmp labels into real labels
      - action: labelmap
        regex: __tmp_pod_label_(.+)

# We purely want a better storage for linkerd-viz metrics only
prometheus-node-exporter:
  enabled: false
prometheus-pushgateway:
  enabled: false
kube-state-metrics:
  enabled: false
alertmanager:
  enabled: false

Logs, error output, etc

linkerd viz routes doesn't show anything other than the serviceProfiles Routes themselves with no metrics.

I have observed that the tap container within linkerd-viz namespace emits:

2024/11/12 16:17:35 http: TLS handshake error from 10.1.221.219:49028: EOF

The IP here is that of the tap container itself

output of linkerd check -o short

linkerd-version
---------------
‼ cli is up-to-date
    is running version 24.10.5 but the latest edge version is 24.11.2
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 24.10.5 but the latest edge version is 24.11.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-5966dbc6fb-4ztlm (edge-24.10.5)
	* linkerd-destination-5966dbc6fb-8ssz2 (edge-24.10.5)
	* linkerd-destination-5966dbc6fb-v764g (edge-24.10.5)
	* linkerd-identity-6bc4bb4b95-2fdts (edge-24.10.5)
	* linkerd-identity-6bc4bb4b95-4g9xq (edge-24.10.5)
	* linkerd-identity-6bc4bb4b95-rb7t8 (edge-24.10.5)
	* linkerd-proxy-injector-6cdb97df95-84pbh (edge-24.10.5)
	* linkerd-proxy-injector-6cdb97df95-cscwl (edge-24.10.5)
	* linkerd-proxy-injector-6cdb97df95-k6nrx (edge-24.10.5)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* metrics-api-5b5c4c4868-g6ksg (edge-24.10.5)
	* prometheus-server-648958b888-dslxr (edge-24.10.5)
	* tap-68df957499-5gp58 (edge-24.10.5)
	* tap-injector-676fd96b96-fgcrp (edge-24.10.5)
	* web-74d74c5dc4-nnccp (edge-24.10.5)
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
‼ prometheus is installed and configured correctly
    missing ClusterRoles: linkerd-linkerd-viz-prometheus
    see https://linkerd.io/2/checks/#l5d-viz-prometheus for hints

Status check results are √

Environment

Kubernetes v1.29.8
Microk8s

Possible solution

linkerd viz dashboard still appears to operate, but the linkerd viz routes command only works when i use the bundled prometheus, rather than the one i deploy

Additional context

relates to: #12889
relates to: #10804

Would you like to work on fixing this bug?

None

@jack1902 jack1902 added the bug label Nov 12, 2024
@jack1902
Copy link
Author

jack1902 commented Nov 13, 2024

In addition to the above, i have also deployed:

---
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  name: prometheus-server-admin
  namespace: linkerd-viz
spec:
  accessPolicy: deny
  podSelector:
    matchLabels:
      app.kubernetes.io/component: server
      app.kubernetes.io/instance: prometheus
      app.kubernetes.io/name: prometheus
      helm.sh/chart: prometheus-25.8.2
  port: 9090
  proxyProtocol: HTTP/1
---
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: prometheus-server-admin
  namespace: linkerd-viz
spec:
  requiredAuthenticationRefs:
  - kind: ServiceAccount
    name: metrics-api
    namespace: linkerd-viz
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: prometheus-server-admin

and extended the meshtlsauthentication for the allow-viz:

---
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
  namespace: {{ .Release.Namespace }}
  name: linkerd-admin
spec:
  podSelector:
    matchLabels: {}
  port: linkerd-admin
  proxyProtocol: HTTP/2
---
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-viz
spec:
  targetRef:
    kind: Namespace
    name: {{ .Release.Namespace }}
  requiredAuthenticationRefs:
    - name: linkerd-viz
      kind: MeshTLSAuthentication
      group: policy.linkerd.io
---
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
  name: linkerd-viz
  namespace: {{ .Release.Namespace }}
spec:
  identities:
    - "tap.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"
    - "prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"
    - "prometheus-server.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"

The linkerd viz dashboard shows metrics for my app, (it uses the service/ which appears to work locally, but when i use deploy/ it only works when using the bundled prometheus)

@jack1902
Copy link
Author

I've updated the original post as i had mistakenly put the prometheus.yml under server when it should have been under serverFiles. Things appear to be working much better and linkerd viz routes deploy/ now works (because obviously the data needed is there!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant