Skip to content

The difference between the raw metrics and downsampling metrics  #7800

Open
@anarcher

Description

@anarcher

Thanos, Prometheus and Golang version used:
thanos:0.36.1

Object Storage Provider: S3

What happened:
There is a difference between the raw metrics and downsampling metrics as follows. (I couldn't see any particular issues in compaction.) Could there be a reason for this difference? Is there any specific area I should check?
image

image

kube_pod_info had the following skip series warn log:
image

ts=2024-10-06T11:38:49.185869258Z caller=streamed_block_writer.go:116 level=warn msg="empty chunks happened, skip series" series="{__cluster__='prod-kr-a-k8s', __name__='kube_pod_info', __replica__='prometheus-agent-k8s-thanos-0', cluster='prod-kr-a', container='kube-rbac-proxy-main', created_by_kind='Workflow', created_by_name='sync-ehr-1727999700', env='prod', host_ip='10.128.91.30', host_network='false', instance='10.128.72.3:8443', job='kube-state-metrics', namespace='katalog', node='ip-10-128-91-30.ap-northeast-2.compute.internal', pod='sync-ehr-1727999700-hook-621784931', pod_ip='10.128.91.196', priority_class='default', prometheus='addon-monitoring/agent-k8s-thanos', region='kr', role='service', uid='839e9db9-035d-4c4b-854a-e6862a7ece28'}"

running thanos tools bucket verify does not report any issues for the downsampled block

thanos tools bucket verify --objstore.config-file=./cfg/thanos-p01.yaml --id=01J9B6AC49SWMBZRE5G4Q333EK --issues=index_known_issues
ts=2024-10-06T12:45:59.363563Z caller=factory.go:53 level=info msg="loading bucket configuration"
ts=2024-10-06T12:45:59.36678Z caller=verify.go:138 level=info verifiers=index_known_issues msg="Starting verify task"
ts=2024-10-06T12:45:59.366804Z caller=index_issue.go:33 level=info verifiers=index_known_issues verifier=index_known_issues msg="started verifying issue" with-repair=false
ts=2024-10-06T12:46:03.995076Z caller=fetcher.go:623 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=4.628019583s duration_ms=4628 cached=1438 returned=244 partial=0
ts=2024-10-06T13:18:32.289597Z caller=index_issue.go:75 level=info verifiers=index_known_issues verifier=index_known_issues msg="verified issue" with-repair=false
ts=2024-10-06T13:18:32.295277Z caller=verify.go:157 level=info verifiers=index_known_issues msg="verify task completed"
ts=2024-10-06T13:18:32.377922Z caller=main.go:174 level=info msg=exiting

What you expected to happen:
The trend in both the raw data and the downsampled data is similar.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Anything else we need to know:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions