-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Bug description
We are facing an issue where our metric dashboards doesn't show all data on charts even though the data is present in the database.
We use Signoz in a Kubernetes setup with Clickhouse set up with 2 shards and 3 replicas.
Signoz was installed using helm.
We started with 2 shards and 1 replica in Clickhouse, later , it was upgraded to 2 shards and 2 replicas, and now it runs with 2 shards and 3 replicas.
This issue did not occur until we added replicas.
Currently, we use the Signoz chart v.0.92.0 which corresponds to signoz v.0.94.0 , but this issue is old - we first noticed it around v0.64.0 (chart 0.62.3).
We only send logs and metrics to Signoz; we don't send traces.
We only experience this issue with metrics; logs seem fine.
The problematic dashboard/chart is built with PromQL.
Our hosts run on Windows Server OS and send logs via fluentd, metrics via Telegraf, but since the data is in the database it's not an issue with our hosts or the software we use to send data to Signoz.
Each chart refresh shows different data - the actual chart is constantly different and it affects the whole chart, not just the newest data.
We tried:
- setting up a new Signoz build that would start with shards and replicas from installation (assuming something went wrong when adding shards or replicas)
- using clickhouse internal_replication setting to false (on a separate signoz install)
With the same result - each time, the chart curve is different and some data is missing.
It looks like Signoz connects to a different replica each time and just shows data from it and not from the whole clickhouse cluster, so it uses
SELECT ... FROM signoz_metrics.<table>
instead of
SELECT ... FROM clusterAllReplicas('<cluster name>', signoz_metrics, <table> query)
When connecting to clickhouse pods and executing the above queries it shows the whole data only when using
SELECT ... FROM clusterAllReplicas('<cluster name>', signoz_metrics, <table>
(which is expected as performing default SELECT query only uses the pods data, not the whole cluster data).
I did open issue #7645 , but those issues seem different because #7465 breaks the chart and shows nothing ("API responded with 500 - Something went wrong status: error in prom queries), whereas this issue shows some data but not all of it.
Any way to fix this issue or any possible solution for this?
Expected behavior
All data present in the database is shown in the dashboard.
How to reproduce
Not sure. Seems that adding a replica to clickhouse pods in values.yaml:
layout:
replicasCount: 2
is the starting point of this issue.
Version information
- Signoz version: 0.94.0 / chart 0.92.0
- Browser version: each one
- Your OS and version: Windows
- Your CPU Architecture(ARM/Intel): Intel