Skip to content

High memory consumptions and OOMs on latest version #1233

Open
@tr3mor

Description

@tr3mor

What happened: We are using Grafana to visualise data from ClickHouse (https connection), and these dashboards are shown 24/7 on many screens. After updating to Grafana 11.6.0 and the latest version of the plugin, we started to see a lot of OOM kills of our Grafana pods.
In general, memory pattern looks like this, with container mem limit being 14gb, and caused multiple OOMkills.

Image

Within the container memory distribution looks like this

Image

Using profiling for Grafana plugins, we collected heap dump

https://github.com/user-attachments/assets/5d5a16ca-b4b6-4da1-ad97-0411594324ff

As you can see, most memory is allocated within the connection pool.
We also noticed that increasing the lifespan of the connection (in datasource settings) makes the situation better and memory growth slows down, but still causes occasional OOM kills.

We also tried using GOMEMLIMIT to make GC more active, but it didn't help that much.

What you expected to happen: Memory is cleaned before oom happens

How to reproduce it (as minimally and precisely as possible):

I dont know, since I dont know a root cause, hopefully heapdump can help

Environment:

  • Grafana version: 11.6.0
  • Plugin version: 4.8.2
  • OS Grafana is installed on: k8s, official image
  • User OS & Browser: Chrome
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Incoming

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions