-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Describe the bug
I've got an x86_64 system with 128 cores. The otel collector adds about 5Mib to its working memory every time it scrapes a metrics endpoint. Eventually it hits up against the memorylimiter but the garbage collection never seems to really make headway and eventually fails to reclaim enough memory.
My identically configured systems with 8 or 16 x86_64 cores do not appear to leak in this manner.
My aarch64 system with a similar config and with 64 cores does also appear to leak in this manner.
Steps to reproduce
Run the otel-collector on a system with a lot of processing cores
What did you expect to see?
Memory usage eventually stabilize
What did you see instead?
Memory usage grows to fill space allotted - tested up to 4Gib (take 6 days)
What version did you use?
otelcol-contrib version 0.114.0 (memory code is probably in the base collector)
What config did you use?
---
processors:
batch: {}
transform/hostname:
metric_statements:
- context: datapoint
statements:
- set(attributes["nodename"], "host.fnal.gov")
- set(resource.attributes["nodename"], "host.fnal.gov")
memory_limiter:
check_interval: 30s
limit_mib: 384
exporters:
prometheus:
endpoint: "[::]:9299"
enable_open_metrics: true
metric_expiration: 2m
service:
telemetry:
metrics:
level: none
pipelines:
metrics:
receivers:
- prometheus
processors:
- memory_limiter
- transform/hostname
- batch
exporters:
- prometheus
receivers:
prometheus:
config:
scrape_configs:
- job_name: node-exporter
scrape_interval: 45s
static_configs:
- targets:
- localhost:9100
labels:
instance: host.fnal.gov:9100
- job_name: systemd-exporter
scrape_interval: 45s
static_configs:
- targets:
- localhost:9558
labels:
instance: host.fnal.gov:9558Environment
OS: Almalinux 9
Platform: podman
Podman Quadlet file: /etc/containers/systemd/otel-collector.container
# THIS FILE IS MANAGED BY PUPPET
[Service]
TimeoutStartSec=900
TimeoutStopSec=30
TasksMax=4096
CPUWeight=30
MemoryMax=512M
IOSchedulingClass=best-effort
IOSchedulingPriority=7
IOWeight=30
Restart=always
[Container]
AutoUpdate=registry
DropCapability=ALL
User=5219
Group=8247
HostName=%H
LogDriver=journald
NoNewPrivileges=true
Pull=missing
ReadOnly=true
PodmanArgs=--stop-signal=SIGKILL
Volume=/etc/otel-collector:/etc/otel-collector:ro,rslave,z
Environment=GOMAXPROCS=4
Environment=GOMEMLIMIT=384MiB
Exec=--config /etc/otel-collector/otel-config.yaml
Image=ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:latest
Network=host
PublishPort=[::]:9299:9299
[Install]
WantedBy=default.targetAdditional context
endpoints:
[root@host ~]# curl -s localhost:9558/metrics |wc -l
5380
[root@host ~]# curl -s localhost:9100/metrics |wc -l
6317logs
Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:21:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:21:44.643Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:22:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:22:44.635Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:23:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:23:14.638Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:14.618Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 327}
Nov 26 09:24:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:24:44.625Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:25:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:25:44.624Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.557Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:26:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:26:14.628Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:27:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:14.634Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.557Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:27:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:27:44.634Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 287}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 326}
Nov 26 09:28:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:28:44.636Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.557Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:29:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:29:14.637Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 328}
Nov 26 09:30:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:14.639Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:30:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:30:44.630Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 290}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:31:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:31:44.640Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 289}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 329}
Nov 26 09:32:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:32:14.640Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 292}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 332}
Nov 26 09:33:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:14.641Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 294}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 334}
Nov 26 09:33:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:33:44.642Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 296}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 336}
Nov 26 09:34:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:34:44.642Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 299}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 338}
Nov 26 09:35:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:35:14.637Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 301}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 340}
Nov 26 09:36:14 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:14.636Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 306}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.556Z info [email protected]/memorylimiter.go:203 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 345}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z info [email protected]/memorylimiter.go:173 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:36:44 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:36:44.645Z warn [email protected]/memorylimiter.go:210 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 308}
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:18.677Z error scrape/scrape.go:1298 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:18 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:37:23.687Z error scrape/scrape.go:1298 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:37:23 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:03.673Z error scrape/scrape.go:1298 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "node-exporter", "target": "http://localhost:9100/metrics", "error": "data refused due to high>
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:03 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1253
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: 2024-11-26T15:38:08.670Z error scrape/scrape.go:1298 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "systemd-exporter", "target": "http://localhost:9558/metrics", "error": "data refused due to h>
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1298
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1376
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
Nov 26 09:38:08 host.fnal.gov systemd-otel-collector[1202537]: github.com/prometheus/[email protected]/scrape/scrape.go:1253