Skip to content

Plugin scans entire table to retrieve JSON paths, causing high ClickHouse CPU usage #1461

@Rajin9601

Description

@Rajin9601

What happened:

I connected clickhouse that contains Terrabytes of logs which has JSON column.
When i used grafana, clickhouse cpu was heavily used.

So when i looked at the clickhouse's slow query log, i noticed grafana is requesting this query.

SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM "otel"."otel_logs" WHERE SETTINGS max_execution_time=10

This query scans all rows of otel_logs table. (for my case, it timed out because of 10s of max_execution_time)

This is problem because it uses cpu of clickhouse heavily.

What you expected to happen:

The following code determines the available JSON paths:

async fetchPathsForJSONColumns(
database: string | undefined,
table: string,
jsonColumnName: string
): Promise<TableColumn[]> {
const prefix = Boolean(database) ? `"${database}".` : '';
const rawSql = `SELECT arrayJoin(distinctJSONPathsAndTypes(${jsonColumnName})) FROM ${prefix}"${table}" SETTINGS max_execution_time=10`;
const frame = await this.runQuery({ rawSql });
if (frame.fields?.length === 0) {
return [];

I think the plugin should provide a setting that allows users to configure how JSON paths are retrieved. We need a discussion about the best way to make this behavior configurable.

Here are some possible options I’ve considered:

  1. Add a configurable LIMIT
SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM (
    SELECT LogAttributes from "otel"."otel_logs" limit 
        1000 -- configurable limit
) SETTINGS max_execution_time=10
  1. Add a configurable WHERE condition
SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM (
    SELECT LogAttributes from "otel"."otel_logs" where 
        Timestamp > now() - INTERVAL 1 hour -- configurable where statement
) SETTINGS max_execution_time=10

This is more flexible, but more complicated since the timestamp column name may differ between tables.

  1. Simple on/off

If disabled, the plugin won’t query for JSON paths at all.

Environment:

  • Grafana version: 12.2.1
  • Plugin version: 4.11.2
  • OS Grafana is installed on:
  • User OS & Browser:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Incoming

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions