Is it expected that Thanos Compactor has consistently high Read IOPS/Throughput? #8177

irgama · 2025-03-26T20:31:18Z

irgama
Mar 26, 2025

Right now we are running into a problem where the Compactor maxing out our configured EFS throughput on read operations. As far as we can tell this is happening essentially 24/7.

Part of me thinks this is due to compactor regularly having silent crashes, however I have not found any evidence of that. I do not get any of the expected error or warn logs, nor do I see the halt metic ever be 1 in grafana. The only indication that I get that something is wrong is that according to grafana/prometheus, compactor keeps going "down"

^Is the above normal? If so, then our throughput limits must be set too low. Is there any documentation on expected minimum EFS/Disk throughput?

Answered by irgama

Mar 28, 2025

For anyone reading this in the future, my problem was that the thanos container was repeatedly restarting, and appearing to fail silently. No error logs in either kubernetes nor the container.

I have since moved the thanos container to EBS storage, and upped the default limits from 100m CPU and 100MiB to 1 CPU and 2Gi. That has allowed the container to work.

Also, if you want to check whether or not the thanos compactor is working: If, in grafana, the "thanos_compact_todo_compaction_blocks" never goes down it is not working.

View full answer

@irgama · 2025-03-26T20:33:57Z

dosubot[bot]
bot Mar 26, 2025

Hey @irgama! I'm here to help you with any bugs, questions, or contributions. Let's see what's going on with Thanos Compactor together.

There isn't specific documentation on the expected minimum EFS or disk throughput for Thanos Compactor. However, consistently high read IOPS and throughput could be influenced by several factors, including the configuration of the compactor and the performance of the underlying storage system.

To diagnose silent crashes or high read operations, you can monitor specific metrics and logs provided by Thanos Compactor. Key metrics include GarbageCollectedBlocks, GarbageCollections, compactions, compactionRunsStarted, compactionFailures, and verticalCompactions. These metrics can help identify potential issues in the compaction process ^[1].

Additionally, Thanos Compactor interacts with several dependencies that could affect its performance, such as object storage and metadata fetchers. Any latency or issues with these services can impact the compactor's performance ^[1].

If the compactor is going "down" intermittently without logging errors, it might be worth checking the network and context management settings, as well as the concurrency settings, which can influence performance ^[1].

Given the high utilization you're experiencing, it might be necessary to review and potentially increase your EFS throughput limits to accommodate the compactor's demands.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

0 replies

irgama · 2025-03-28T12:44:31Z

irgama
Mar 28, 2025
Author

For anyone reading this in the future, my problem was that the thanos container was repeatedly restarting, and appearing to fail silently. No error logs in either kubernetes nor the container.

I have since moved the thanos container to EBS storage, and upped the default limits from 100m CPU and 100MiB to 1 CPU and 2Gi. That has allowed the container to work.

Also, if you want to check whether or not the thanos compactor is working: If, in grafana, the "thanos_compact_todo_compaction_blocks" never goes down it is not working.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it expected that Thanos Compactor has consistently high Read IOPS/Throughput? #8177

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it expected that Thanos Compactor has consistently high Read IOPS/Throughput? #8177

Uh oh!

irgama Mar 26, 2025

Replies: 2 comments

Uh oh!

dosubot[bot] bot Mar 26, 2025

Uh oh!

irgama Mar 28, 2025 Author

irgama
Mar 26, 2025

dosubot[bot]
bot Mar 26, 2025

irgama
Mar 28, 2025
Author