Skip to content

Conversation

@borismattijssen
Copy link

@borismattijssen borismattijssen commented Jan 7, 2026

Summary

Add support for logging HTTP headers as JSON in ClickHouse query_log via the log_comment field. This enables better observability by correlating queries with their originating Grafana context (dashboard, panel, alert rule, etc.).

Changes

Frontend

  • Added "Log Headers as Comment" toggle
  • Added additional "Header Whitelist Regex" input to control which headers are logged (default: (?i)^(x-dashboard|x-panel|x-rule))

Backend

  • Serialize whitelisted headers as JSON and pass via ClickHouse's log_comment setting in pkg/plugin/driver.go

Security Considerations

Headers are filtered by regex to prevent logging sensitive data. For example, X-Grafana-Id contains user JWT tokens and should be excluded from the whitelist pattern. The default pattern only captures dashboard/panel/rule context headers.

Implementation Evolution

  1. Commit 1 (9a565f4): Added FE&BE logic for logging headers as log_comment (header whitelist is hard-coded)
  2. Commit 2 (dd7d891): Make header whitelist user configurable from the FE

Test Plan

  1. Create a new data source with the setting enabled
  2. Create a dashboard with the following query:
SELECT
    log_comment AS lc,
    query AS q,
    *
FROM system.query_log
ORDER BY event_time DESC
LIMIT 20
  1. Observe the log comment contains the filtered headers as JSON

@CLAassistant
Copy link

CLAassistant commented Jan 7, 2026

CLA assistant check
All committers have signed the CLA.

@SpencerTorres
Copy link
Collaborator

Hey thanks for submitting this. It looks good so far, but I have some thoughts I'd like to discuss.

Related PR, let me know if you've seen this: #1433

Other related PR for attaching extra information: #1470

I have asked the ClickHouse core team about expanding the system.query_log to include a metadata Map(String, String) column for more complex use cases like this, but they're hesitant to add new column to an already large table. This is why my PR (#1470) uses the client info (a semi-structured string, similar to your usage of log_comment).

My problem with log_comment is that it should be usable by the user when they write their query, usually by adding it as a SETTINGS value. By overriding it on the backend like this it could take away from users already using this in their dashboards. With the client info we can instead attach it in a user-agent style body of text.

I frequently see requests like this from users where they would like to log extra information with their queries for observability into their plugin usage. Some of these requests (like trying to log headers) exceed what we SHOULD be doing with a single text field. It's hacky, hard to parse, and I really think we should find a better way to do it.

Maybe it's time the plugin creates its own table as a complimentary addition to the system.query_log. This would require more user configuration, since Grafana ClickHouse credentials typically should not have write access. If we had some kind of grafana.query_log where we can add whatever plugin data/user info we want, we could easily iterate and add these features without depending on the system table. I'm not sure what columns this table would have, but we could start with Timestamp and QueryID where the query ID matches the one in the system.query_log for easy JOINing. This row could even include the panel configuration. Debugging would become so much easier if we had this info.

I'll forward this to some others to see what they think

@borismattijssen
Copy link
Author

borismattijssen commented Jan 8, 2026

Hi @SpencerTorres,

Thanks for your thoughtful comment. I saw #1433 when I was already midway 🙈 so decided to finish and submit this as well. I think this PR provides a more complete implementation (FE controls, user definable selection of headers - e.g. we need also alerting metadata, and JSON instead of custom string format).

About overwriting the log_comment, I think you make a good point. Alternatively, we could send this metadata as custom query settings (see the feature request that I wrote here). There was no big reason for me to go with log_comment over custom query settings in this PR, so I'd be happy to change this.

@adamyeats adamyeats moved this from Incoming to Needs Review in Partner Datasources Jan 8, 2026
@SpencerTorres
Copy link
Collaborator

@borismattijssen Another issue I have with this feature is that logging headers can be risky in general. I think keeping track of headers is better handled by the Grafana server telemetry than the plugin. There's already an ongoing effort across plugins to prevent secrets from being logged in error messages. Trying to fit headers into log_comment (or any other field, including metadata) is beyond the scope of the query itself. I think manually extracting some headers like user ID or dashboard ID is reasonable however.

I can ask the Grafana team what they think, but I don't think this PR will be merged as it is now. I appreciate the effort and workarounds you suggested though

@borismattijssen
Copy link
Author

Hi Spencer, thanks for your comments. Let me address all your points here (also the comment in this thread):

  1. is that logging headers can be risky in general.

    I agree, that's why I included the whitelist regex so we can exclude sensitive headers. I'm also fine with hardcoding — for us we'd need Dashboard-Uid, Dashboard-Title, Panel-Id, Panel-Title, Rule-Uid, Rule-Title.

  2. The problem with the custom settings is that they're configurable to have different prefixes.

    This is a good point. We could add a configurable field in the FE and set the default value to the default value from CH?

  3. I think keeping track of headers is better handled by the Grafana server telemetry than the plugin.

    In general I'm fine with a completely different implementation, I'm not particularly tied to this one — it was just the simplest angle I could find ;) For us the most important thing is that we can link dashboards/rules to queries in the query_log. Lmk if we can do something if you decide to go another route.

@aangelisc
Copy link
Contributor

Hi @borismattijssen - thank you for the contribution here 😊

Maybe it's time the plugin creates its own table as a complimentary addition to the system.query_log.

I'm not entirely anti this approach but we strongly discourage users from configuring data sources with write access to their databases. The potential risk of a malicious user modifying data is unavoidable if the data source has write access. That being said, if there is no appetite to expand the query_log table with a metadata column then perhaps we can explore this option instead. However, we'd have to be considerate of any performance impacts this may have. If we have a substantial number of users issuing queries in parallel, then we'd effectively double the number of queries we're issuing to a downstream DB (reads and writes).

Alternatively, we could extend @SpencerTorres's approach with the user information to also include dashboard & panel in the client info section. Thoughts on this?

@borismattijssen
Copy link
Author

Alternatively, we could extend @SpencerTorres's approach with the user information to also include dashboard & panel in the client info section. Thoughts on this?

This would also work for us! Most important for us is to include the IDs I mentioned earlier (and preferably the titles, but this is a nice to have).

@adamyeats adamyeats moved this from Needs Review to Waiting in Partner Datasources Jan 16, 2026
@adamyeats adamyeats force-pushed the bmattijssen/headers-as-log-comment branch from 3553f9e to 10597ad Compare January 16, 2026 17:02
@borismattijssen
Copy link
Author

borismattijssen commented Jan 19, 2026

Hi @SpencerTorres , I was wondering what your current thoughts are on this topic? Do you think we can work towards an implementation either through this PR, the client_info field, or with #1433?

@SpencerTorres
Copy link
Collaborator

Internally I was able to persuade the team into adding a metadata Map(String, String) column in theory, but there's no telling when this will be available.

For now, I suppose it could go into the client info so long as the syntax doesn't break the current query syntax. It's hard to parse these client info strings so I'd be mindful of what tokens will end up in there. Also of course let's be careful to not include any sensitive data.

I would prefer the client info approach over the comment for now though. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Waiting

Development

Successfully merging this pull request may close these issues.

6 participants