Skip to content

AKC timeout using Thanos  #7874

Closed
Closed
@antikilahdjs

Description

@antikilahdjs

Thanos with Memcached enabled plus MiniO as Long-term

Thanos, Prometheus and Golang version used:

Object Storage Provider: S3 MiniO

What happened:
I have configured my Thanos alongside Memcached but I am not able to fix the error about my query search when I need search more than 2 days. I am getting the error below

receive series from Addr: 10.233.117.207:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeoutreceive series from Addr: 10.233.116.94:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout

My Thanos Store:

args:
            - store
            - '--log.level=info'
            - '--log.format=logfmt'
            - '--data-dir=/var/thanos/store'
            - '--grpc-address=0.0.0.0:10901'
            - '--http-address=0.0.0.0:10902'
            - '--objstore.config=$(OBJSTORE_CONFIG)'
            - '--ignore-deletion-marks-delay=24h'
            - '--block-sync-concurrency=120'
            - '--sync-block-duration=60m'
            - '--index-cache-size=4096MB'
            - '--chunk-pool-size=4GB'
            - '--store.grpc.series-max-concurrency=300'
            - '--consistency-delay=30m'
            - |-
              --index-cache.config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "60s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "max_item_size": 0
                "timeout": "180s"
              "type": "MEMCACHED"
            - |-
              --store.caching-bucket.config="blocks_iter_ttl": "720h"
              "chunk_object_attrs_ttl": "720h"
              "chunk_subrange_size": 128000
              "chunk_subrange_ttl": "720h"
              "config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "60s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "max_item_size": 0
                "timeout": "180s"
              "max_chunks_get_range_requests": 3
              "metafile_content_ttl": "720h"
              "metafile_doesnt_exist_ttl": "1h"
              "metafile_exists_ttl": "720h"
              "metafile_max_size": "4MiB"
              "type": "MEMCACHED"
            - |-
              --tracing.config="config":
                "sampler_param": 2
                "sampler_type": "ratelimiting"
                "service_name": "thanos-store"
              "type": "JAEGER"

My Thanos Frontend

args:
            - query-frontend
            - '--enable-auto-gomemlimit'
            - '--log.level=info'
            - '--log.format=logfmt'
            - '--query-frontend.compress-responses'
            - '--http-address=0.0.0.0:9090'
            - >-
              --query-frontend.downstream-url=http://thanos-query.thanos.svc.cluster.local.:9090
            - '--query-range.split-interval=24h'
            - '--labels.split-interval=12h'
            - '--query-range.max-retries-per-request=100'
            - '--labels.max-retries-per-request=25'
            - '--query-frontend.log-queries-longer-than=0'
            - '--query-range.max-query-parallelism=120'
            - '--query-frontend.vertical-shards=0'
            - '--cache-compression-type='
            - '--query-frontend.downstream-tripper-config={"response_header_timeout": "5m", "max_idle_conns_per_host": 100}'
            - |-
              --query-range.response-cache-config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "30s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "timeout": "180s"
                "expiration": "720h"
              "type": "MEMCACHED"
            - |-
              --labels.response-cache-config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "30s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "timeout": "180s"
                "expiration": "720h"
              "type": "MEMCACHED"
            - |-
              --tracing.config="config":
                "sampler_param": 2
                "sampler_type": "ratelimiting"
                "service_name": "thanos-query-frontend"
              "type": "JAEGER"

My Prometheus:

containers:
    - args:
        - '--web.console.templates=/etc/prometheus/consoles'
        - '--web.console.libraries=/etc/prometheus/console_libraries'
        - '--storage.tsdb.retention.time=12h'
        - '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
        - '--storage.tsdb.path=/prometheus'
        - '--web.enable-lifecycle'
        - '--web.enable-admin-api'
        - '--web.route-prefix=/'
        - '--web.config.file=/etc/prometheus/web_config/web-config.yaml'
        - '--storage.tsdb.max-block-duration=2h'
        - '--storage.tsdb.min-block-duration=2h'
        - '--web.max-connections=8096'
        - '--query.max-concurrency=60'
      image: 'prom/prometheus:v2.49.1'

What you expected to happen:

My Prometheus have 6h of retention but if I try search more than this am getting the error mentioned

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

receive series from Addr: 10.233.117.207:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeoutreceive series from Addr: 10.233.116.94:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout

Anything else we need to know:

ts=2024-08-22T04:15:02.506236929Z caller=memcached_client.go:438 level=warn name=index-cache msg="failed to fetch items from memcached" numKeys=1 firstKey=EP:01J5TQ7GTAK7JFP1SDHAZQABMB:NskVASoO0H1CJRIx74k3hIBPzIM6wCRkKvWOjc9V3Dg:dss err="write tcp 10.233.66.17:47668->10.233.31.160:11211: write: connection timed out"

Environment:

  • OS (e.g. from /etc/os-release): RedHat 8.5
  • Kernel (e.g. uname -a): 4.8
  • Others: Kubernetes

-->

Could you please help me to understand what I did wrong?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions