Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial] Write complete tutorial on how to setup OpenSearch with the plugin in K8s and Prometheus craping it #240

Open
lukas-vlcek opened this issue Dec 5, 2023 · 11 comments

Comments

@lukas-vlcek
Copy link
Collaborator

There is a lack of complete tutorial about how to setup OpenSearch cluster with the plugin in K8s and have Prometheus craping the metric endpoint.

See: https://forum.opensearch.org/t/prometheus-not-able-to-scrape-metrics-on-pod/16908/

Idea: This setup flow should be part of plugin new release process or even the CI (?)

@layavadi
Copy link

layavadi commented Apr 8, 2024

Is there any progress in this task. I would like to use prometheus to scrape opensearch metrics and use Grafana dashboards to monitor

@smbambling
Copy link

This tutorial is very much needed, I've been though several attempts to get Prometheus to scrape an endpoint on Kubernetes with no success

@lukas-vlcek
Copy link
Collaborator Author

Just for the record the following is a Slack thread we had with @smbambling on this topic:
https://opensearch.slack.com/archives/C051JEH8MNU/p1715262647976709

@smbambling
Copy link

I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.

Notes:

  • kube-prometheus-stack is used to deploy Prometheus, Grafana, etc
  • OpenSearch Helm chart is used to deploy OpenSearch
  • Additonal security configs ( ie internal user, bindings, index managemanet, etc. ) / index management is performed via a customer OpenSearch-Helper helm chart

Method 1: Static Prometheus configs

In this method I've modified the kube-prometheus-stack Helm value override in order to apply additional configs.

In the below values I've tested multiple different combintations of configs

  • only insecure_skip_verify: true no other tls_configs set
  • insecure_skip_verify: false with ca_file set
  • max_version: TLS12 both set and not set
  • cert_file + key_file both set and not set
prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: opensearch-job
        metrics_path: /_prometheus/metrics
        scheme: https
        static_configs:
          - targets:
              - opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200
        basic_auth:
          username: "admin"
          password: "myfakePW"
        tls_config:
          insecure_skip_verify: true
          max_version: TLS12
          ca_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/ca.crt
          cert_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.crt
          key_file: /etc/prometheus/secrets/my-internal-wildcard-my-tls-certs/tls.key

From another pod within the monitoring namespace where Prometheus ( no curl installed in the Prom container ) is running. I'm able to curl the internal service DNS name set above.

--- with referencing the CA cert
$ curl -XGET --cacert /tmp/foo -u 'admin:myfakePW' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_jvm_mem_pool_max_bytes Maximum usage of memory pool
# TYPE opensearch_jvm_mem_pool_max_bytes gauge
opensearch_jvm_mem_pool_max_bytes{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",pool="survivor",} 0.0


AND

--- without referencing the CA cert
$ curl -k -u 'admin:tes+1Passw*rd2' 'https://opensearch-localk3s-cl1-master.opensearch.svc.cluster.local:9200/_prometheus/metrics' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_indices_get_count Count of get commands
# TYPE opensearch_indices_get_count gauge
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_get_count{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 48.0

@smbambling
Copy link

I've attempted to configure a scrape endpoint for Proemtheus to OpenSearch _prometheus/metrics via two seperate methods.

Notes:

  • kube-prometheus-stack is used to deploy Prometheus, Grafana, etc
  • OpenSearch Helm chart is used to deploy OpenSearch
  • Additonal security configs ( ie internal user, bindings, index managemanet, etc. ) / index management is performed via a customer OpenSearch-Helper helm chart

Method 2: Using Prometheus Service Monitor

In this method I've created a servicemonitor for kube-prometheus-stack to read and generate scrape targets.

Below is the output for my created servicemonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: opensearch-master
    meta.helm.sh/release-namespace: opensearch
  creationTimestamp: "2024-05-08T14:51:02Z"
  generation: 12
  labels:
    app.kubernetes.io/component: opensearch-localk3s-cl1-master
    app.kubernetes.io/instance: opensearch-master
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opensearch
    app.kubernetes.io/version: 2.11.1
    helm.sh/chart: opensearch-2.17.0
    release: kube-prometheus-stack
  name: opensearch-service-monitor
  namespace: monitoring
  resourceVersion: "141672"
  uid: cf1df5d5-a855-4eb1-8cb5-da2ddaad99f6
spec:
  endpoints:
  - basicAuth:
      password:
        key: password
        name: opensearch-service-monitor-basic-auth
      username:
        key: username
        name: opensearch-service-monitor-basic-auth
    interval: 10s
    path: /_prometheus/metrics
    port: http
    scheme: https
    tlsConfig:
      ca: {}
      insecureSkipVerify: true
  namespaceSelector:
    matchNames:
    - opensearch
  selector:
    matchLabels:
      app.kubernetes.io/component: opensearch-localk3s-cl1-master
      app.kubernetes.io/instance: opensearch-master
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: opensearch
      app.kubernetes.io/version: 2.11.1
      helm.sh/chart: opensearch-2.17.0

Again multiple different combintations of configs were tested within the servicemonitor which proivded the same end result. Where the scrape endpoints are created but there is an SSL handshake issue for Prometheus

Just as verification I could also curl from the same pod in method 1 to the cluster IP endpoints generated via the servicemonitor

$ curl -u 'admin:myfakePW' -k https://10.42.0.69:9200/_prometheus/metrics | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP opensearch_indices_refresh_total_time_seconds Time spent while refreshes
# TYPE opensearch_indices_refresh_total_time_seconds gauge
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-master-2",nodeid="7eGuaMZwTcKZYLfPDnovDA",} 0.0
opensearch_indices_refresh_total_time_seconds{cluster="opensearch-localk3s-cl1",node="opensearch-localk3s-cl1-hot-data-0",nodeid="-Modhwt_TMiOd4f4rSSPhg",} 174.781

In the end both methods produce the following errors in the Prometheus UI

Screenshot 2024-05-09 at 10 13 11 AM

 

@lukas-vlcek
Copy link
Collaborator Author

Thanks @smbambling for putting the effort into write it all down.

@smbambling
Copy link

smbambling commented May 10, 2024

In our testing setup we had limiting ciphers in plugins.security.ssl.transport.enabled_ciphers, commenting this out allowed Prometheus to scrape the endpoints and gather data.

@rarifz
Copy link

rarifz commented Jun 3, 2024

i want to ask something, does this meas the opensearch provide the metrics data to prome? or prome provide the metrics data to opensearch?

@smbambling
Copy link

@rarifz This installs an exporter that exposes metrics about OpenSearch that Prometheus can be configured to scrape

@PDCuong
Copy link

PDCuong commented Oct 17, 2024

hello @smbambling, have you found a workaround? I tried with curl , it worked. But prometheus can not scrape metrics from this path /_prometheus/metrics
FYI, other people use prometheus can scrape if setup cluster only using http protocol.

@aravindhkudiyarasan
Copy link

Hello @smbambling, do we have any workaround for people using HTTPS with basic auth enabled? We see that it's working with curl, but Prometheus cannot scrape metrics from the /_prometheus/metrics path & it shows down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants