AKC timeout using Thanos 

Thanos with Memcached enabled plus MiniO as Long-term


**Thanos, Prometheus and Golang version used**:



**Object Storage Provider**: S3 MiniO

**What happened**:
I have configured my Thanos alongside Memcached but I am not able to fix the error about my query search when I need search more than 2 days. I am getting the error below

receive series from Addr: 10.233.117.207:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeoutreceive series from Addr: 10.233.116.94:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout

My Thanos Store:

```yaml
args:
            - store
            - '--log.level=info'
            - '--log.format=logfmt'
            - '--data-dir=/var/thanos/store'
            - '--grpc-address=0.0.0.0:10901'
            - '--http-address=0.0.0.0:10902'
            - '--objstore.config=$(OBJSTORE_CONFIG)'
            - '--ignore-deletion-marks-delay=24h'
            - '--block-sync-concurrency=120'
            - '--sync-block-duration=60m'
            - '--index-cache-size=4096MB'
            - '--chunk-pool-size=4GB'
            - '--store.grpc.series-max-concurrency=300'
            - '--consistency-delay=30m'
            - |-
              --index-cache.config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "60s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "max_item_size": 0
                "timeout": "180s"
              "type": "MEMCACHED"
            - |-
              --store.caching-bucket.config="blocks_iter_ttl": "720h"
              "chunk_object_attrs_ttl": "720h"
              "chunk_subrange_size": 128000
              "chunk_subrange_ttl": "720h"
              "config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "60s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "max_item_size": 0
                "timeout": "180s"
              "max_chunks_get_range_requests": 3
              "metafile_content_ttl": "720h"
              "metafile_doesnt_exist_ttl": "1h"
              "metafile_exists_ttl": "720h"
              "metafile_max_size": "4MiB"
              "type": "MEMCACHED"
            - |-
              --tracing.config="config":
                "sampler_param": 2
                "sampler_type": "ratelimiting"
                "service_name": "thanos-store"
              "type": "JAEGER"
```

My Thanos Frontend 

```yaml
args:
            - query-frontend
            - '--enable-auto-gomemlimit'
            - '--log.level=info'
            - '--log.format=logfmt'
            - '--query-frontend.compress-responses'
            - '--http-address=0.0.0.0:9090'
            - >-
              --query-frontend.downstream-url=http://thanos-query.thanos.svc.cluster.local.:9090
            - '--query-range.split-interval=24h'
            - '--labels.split-interval=12h'
            - '--query-range.max-retries-per-request=100'
            - '--labels.max-retries-per-request=25'
            - '--query-frontend.log-queries-longer-than=0'
            - '--query-range.max-query-parallelism=120'
            - '--query-frontend.vertical-shards=0'
            - '--cache-compression-type='
            - '--query-frontend.downstream-tripper-config={"response_header_timeout": "5m", "max_idle_conns_per_host": 100}'
            - |-
              --query-range.response-cache-config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "30s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "timeout": "180s"
                "expiration": "720h"
              "type": "MEMCACHED"
            - |-
              --labels.response-cache-config="config":
                "addresses":
                - "thanos-memcached-service.thanos:11211"
                "dns_provider_update_interval": "30s"
                "max_async_buffer_size": 0
                "max_async_concurrency": 1000
                "max_get_multi_batch_size": 0
                "max_get_multi_concurrency": 0
                "max_idle_connections": 400
                "timeout": "180s"
                "expiration": "720h"
              "type": "MEMCACHED"
            - |-
              --tracing.config="config":
                "sampler_param": 2
                "sampler_type": "ratelimiting"
                "service_name": "thanos-query-frontend"
              "type": "JAEGER"
```


My Prometheus:
```yaml 
containers:
    - args:
        - '--web.console.templates=/etc/prometheus/consoles'
        - '--web.console.libraries=/etc/prometheus/console_libraries'
        - '--storage.tsdb.retention.time=12h'
        - '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
        - '--storage.tsdb.path=/prometheus'
        - '--web.enable-lifecycle'
        - '--web.enable-admin-api'
        - '--web.route-prefix=/'
        - '--web.config.file=/etc/prometheus/web_config/web-config.yaml'
        - '--storage.tsdb.max-block-duration=2h'
        - '--storage.tsdb.min-block-duration=2h'
        - '--web.max-connections=8096'
        - '--query.max-concurrency=60'
      image: 'prom/prometheus:v2.49.1'
```

**What you expected to happen**:

My Prometheus have 6h of retention but if I try search more than this am getting the error mentioned

**How to reproduce it (as minimally and precisely as possible)**:

**Full logs to relevant components**:


receive series from Addr: 10.233.117.207:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeoutreceive series from Addr: 10.233.116.94:10901 LabelSets: {prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-0"},{prometheus="kubesphere-monitoring-system/k8s", prometheus_replica="prometheus-k8s-1"},{prometheus="kubesphere-monitoring-system/k8s"} MinTime: 1727308800000 MaxTime: 1730368800000: rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout

**Anything else we need to know**:

ts=2024-08-22T04:15:02.506236929Z caller=memcached_client.go:438 level=warn name=index-cache msg="failed to fetch items from memcached" numKeys=1 firstKey=EP:01J5TQ7GTAK7JFP1SDHAZQABMB:NskVASoO0H1CJRIx74k3hIBPzIM6wCRkKvWOjc9V3Dg:dss err="write tcp 10.233.66.17:47668->10.233.31.160:11211: write: connection timed out"

**Environment**:
- OS (e.g. from /etc/os-release): RedHat 8.5
- Kernel (e.g. `uname -a`): 4.8
- Others: Kubernetes

-->

Could you please help me to understand what I did wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AKC timeout using Thanos #7874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AKC timeout using Thanos #7874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions