-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Component(s)
- servicegraph connector
- prometheus exporter
What happened?
Description
when we use servicegraph connector to gather application topology information and exporter the metrics using prometheus exporter, we found that the otel-collctor eating memory slowly(about 16GiB 2 weeks).The picture below is the node memory usage:
Steps to Reproduce
1、enable servicegraph connector
2、exporter metrics using prometheus exporter
3、remember NOT scrape the metrics endpoint
4、otel-collector OOM eventually
Expected Result
otel-collctor memory usage is steady
Actual Result
otel-collctor memory leak
Collector version
v0.124.1
Environment information
Environment
OS: CentOS Linux release 7.9.2009 (Core) KVM
Binary:offical release v0.124.1
OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 5s
limit_percentage: 85
spike_limit_percentage: 15
batch:
timeout: 100ms
send_batch_size: 4096
send_batch_max_size: 5000
tail_sampling:
decision_wait: 10s
num_traces: 50000
expected_new_traces_per_sec: 1000
policies:
[
{
name: prob-policy,
type: probabilistic,
probabilistic: {sampling_percentage: 10}
},
{
name: error-policy,
type: status_code,
status_code: {status_codes: [ERROR]}
},
{
name: latency-policy,
type: latency,
latency: {threshold_ms: 3000}
},
{
name: force-sample, # always sample if the force_sample attribute is set to true
type: boolean_attribute,
boolean_attribute: { key: agent.force.sample, value: true }
}
]
exporters:
debug:
verbosity: detailed
kafka/trace:
brokers:
- log.gateway.collector:9092
protocol_version: 2.1.0
encoding: zipkin_proto
retry_on_failure:
enabled: true
timeout: 2s
sending_queue:
enabled: true
num_consumers: 30
queue_size: 500000
producer:
max_message_bytes: 31457280
required_acks: 0
compression: lz4
topic: mop-trace
prometheus/servicegraph:
endpoint: 0.0.0.0:18073
connectors:
servicegraph:
store:
ttl: 2s
max_items: 50000
dimensions: [region.code, service.namespace]
latency_histogram_buckets: [10ms, 50ms, 100ms, 500ms, 1s, 3s, 5s, 10s]
cache_loop: 2m # the time to cleans the cache periodically
store_expiration_loop: 30s
virtual_node_peer_attributes: [service.name, db.name, net.sock.peer.addr, net.peer.name, rpc.service, net.sock.peer.name, net.peer.name, http.url, http.target]
metrics_flush_interval: 100ms
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
pprof:
endpoint: :1777
service:
extensions: [health_check, zpages, pprof]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [kafka/trace]
traces/servicegraph:
receivers: [otlp]
processors: [memory_limiter]
exporters: [servicegraph]
metrics/servicegraph:
receivers: [servicegraph]
exporters: [prometheus/servicegraph]
telemetry:
metrics:
readers:
- pull:
exporter:
prometheus:
host: '0.0.0.0'
port: 8888
logs:
level: "info"Log output
2025-06-28T17:04:41.463+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6230}
2025-06-28T17:05:27.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11197}
2025-06-28T17:05:32.338+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6263}
2025-06-28T17:25:37.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11151}
2025-06-28T17:25:42.094+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6312}
2025-06-28T17:30:47.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11150}
2025-06-28T17:30:51.978+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6208}
2025-06-28T17:35:32.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11196}
2025-06-28T17:35:36.884+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6308}
2025-06-28T17:40:42.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11147}
2025-06-28T17:40:47.768+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6264}
2025-06-28T17:43:07.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11129}
2025-06-28T17:43:11.755+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6210}
2025-06-28T17:45:32.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11182}
2025-06-28T17:45:36.624+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6315}
2025-06-28T17:53:07.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11151}
2025-06-28T17:53:11.817+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6284}
2025-06-28T17:55:32.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11237}
2025-06-28T17:55:36.491+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6306}
2025-06-28T18:00:17.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11276}
2025-06-28T18:00:22.166+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6670}
2025-06-28T18:00:42.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11618}
2025-06-28T18:00:48.222+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6559}
2025-06-28T18:02:37.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11327}
2025-06-28T18:02:42.522+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6452}
2025-06-28T18:03:47.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11314}
2025-06-28T18:03:52.618+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6459}
2025-06-28T18:04:37.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11581}
2025-06-28T18:04:41.960+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6352}
2025-06-28T18:05:47.073+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11187}
2025-06-28T18:05:52.748+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6284}
2025-06-28T18:06:37.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11156}
2025-06-28T18:06:43.403+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6371}
2025-06-28T18:07:07.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11226}
2025-06-28T18:07:11.586+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6340}
2025-06-28T18:10:37.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11233}
2025-06-28T18:10:43.423+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6491}
2025-06-28T18:11:27.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11327}
2025-06-28T18:11:32.503+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6413}
2025-06-28T18:18:07.071+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11132}
2025-06-28T18:18:11.679+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6340}
2025-06-28T18:25:37.072+0800 info [email protected]/memorylimiter.go:205 Memory usage is above soft limit. Forcing a GC. {"cur_mem_mib": 11251}
2025-06-28T18:25:41.711+0800 info [email protected]/memorylimiter.go:171 Memory usage after GC. {"cur_mem_mib": 6325}
....Additional context
the v0.93.0 offical build release has no this problem, or maybe the problem is not so obvious.
v0.124.1 memory pprof(inuse_space)
v0.93 memory pprof(inuse_space)
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
