Description
What happened: We are using Grafana to visualise data from ClickHouse (https connection), and these dashboards are shown 24/7 on many screens. After updating to Grafana 11.6.0 and the latest version of the plugin, we started to see a lot of OOM kills of our Grafana pods.
In general, memory pattern looks like this, with container mem limit being 14gb, and caused multiple OOMkills.

Within the container memory distribution looks like this

Using profiling for Grafana plugins, we collected heap dump
https://github.com/user-attachments/assets/5d5a16ca-b4b6-4da1-ad97-0411594324ff
As you can see, most memory is allocated within the connection pool.
We also noticed that increasing the lifespan of the connection (in datasource settings) makes the situation better and memory growth slows down, but still causes occasional OOM kills.
We also tried using GOMEMLIMIT to make GC more active, but it didn't help that much.
What you expected to happen: Memory is cleaned before oom happens
How to reproduce it (as minimally and precisely as possible):
I dont know, since I dont know a root cause, hopefully heapdump can help
Environment:
- Grafana version: 11.6.0
- Plugin version: 4.8.2
- OS Grafana is installed on: k8s, official image
- User OS & Browser: Chrome
- Others:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status