Skip to content

Collector memory increases by about ~20 MB after v0.125.0 release #13014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfc-gh-bnandibhatla opened this issue May 10, 2025 · 3 comments · Fixed by #13107
Closed

Collector memory increases by about ~20 MB after v0.125.0 release #13014

sfc-gh-bnandibhatla opened this issue May 10, 2025 · 3 comments · Fixed by #13107
Labels
bug Something isn't working priority:p1 High

Comments

@sfc-gh-bnandibhatla
Copy link

sfc-gh-bnandibhatla commented May 10, 2025

Component(s)

service

What happened?

Describe the bug

We noticed that the collector memory increases by about ~20 MB after the latest release. Here's a graph (plotted using otelcol_process_memory_rss metric) showing the memory of the collector running an older version (v0.120.0) and the newer version:

Image

Steps to reproduce

Upgrade the collector to the latest version

What did you expect to see?

Very minimal increase in memory used by the collector.

What did you see instead?

The memory increased by about ~20 MB.

Collector version

v0.125.0

Environment information

Environment

OS: Centos7
Compiler(if manually compiled): go 1.23.7

Additional context

We collected heap profiles from both old and new versions of the collector. Here's the heap profile from old version:

Image

Here's the heap profile from the new version of the collector:

Image

It looks like the memory increase is because of a new usage of zap somewhere (go.uber.org/zap/zapcore.newCounters). Drilling down using the flamegraph shows the call stack:

Image

The call to the zapcore.NewSamplerWithOptions seems to be added here: #12617. The ~20 MB increase seems excessive just to maintains some logging related counters.

@sfc-gh-bnandibhatla sfc-gh-bnandibhatla added the bug Something isn't working label May 10, 2025
@mx-psi mx-psi added priority:p1 High release:blocker The issue must be resolved before cutting the next release and removed release:blocker The issue must be resolved before cutting the next release labels May 12, 2025
@bogdandrutu
Copy link
Member

#13015 only fixes the case where OTLP is not used.

github-merge-queue bot pushed a commit that referenced this issue May 12, 2025
…13015)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Avoid re-creating sampler counters every time we wrap with attributes.

<!-- Issue number if applicable -->
#### Link to tracking issue
Updates #13014 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Signed-off-by: Bogdan Drutu <[email protected]>
Co-authored-by: Jade Guiton <[email protected]>
@jade-guiton-dd
Copy link
Contributor

I've filed an issue in the Zap repo to ask if it would be possible to share a sampler core between multiple Zap pipelines (one for each OTel Logger we create), which would be one way to eliminate this issue: uber-go/zap#1498

TimoBehrendt pushed a commit to TimoBehrendt/tracebasedlogsampler that referenced this issue May 20, 2025
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [go.opentelemetry.io/collector/component](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/component/componenttest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |
| [go.opentelemetry.io/collector/confmap](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/consumer](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/consumer/consumertest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |
| [go.opentelemetry.io/collector/pdata](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/processor](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v1.31.0` -> `v1.32.0` |
| [go.opentelemetry.io/collector/processor/processortest](https://github.com/open-telemetry/opentelemetry-collector) | require | minor | `v0.125.0` -> `v0.126.0` |

---

### Release Notes

<details>
<summary>open-telemetry/opentelemetry-collector (go.opentelemetry.io/collector/component)</summary>

### [`v1.32.0`](https://github.com/open-telemetry/opentelemetry-collector/blob/HEAD/CHANGELOG.md#v1320v01260)

##### 🛑 Breaking changes 🛑

-   `configauth`: Removes deprecated `configauth.Authentication` and `extensionauthtest.NewErrorClient` ([#&#8203;12992](open-telemetry/opentelemetry-collector#12992))
    The following have been removed:
    -   `configauth.Authentication` use `configauth.Config` instead
    -   `extensionauthtest.NewErrorClient` use `extensionauthtest.NewErr` instead

##### 💡 Enhancements 💡

-   `service`: Replace `go.opentelemetry.io/collector/semconv` usage with `go.opentelemetry.io/otel/semconv` ([#&#8203;12991](open-telemetry/opentelemetry-collector#12991))
-   `confmap`: Update the behavior of the confmap.enableMergeAppendOption feature gate to merge only component lists. ([#&#8203;12926](open-telemetry/opentelemetry-collector#12926))
-   `service`: Add item count metrics defined in Pipeline Component Telemetry RFC ([#&#8203;12812](open-telemetry/opentelemetry-collector#12812))
    See [Pipeline Component Telemetry RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) for more details:
    -   `otelcol.receiver.produced.items`
    -   `otelcol.processor.consumed.items`
    -   `otelcol.processor.produced.items`
    -   `otelcol.connector.consumed.items`
    -   `otelcol.connector.produced.items`
    -   `otelcol.exporter.consumed.items`
-   `tls`: Add trusted platform module (TPM) support to TLS authentication. ([#&#8203;12801](open-telemetry/opentelemetry-collector#12801))
    Now the TLS allows the use of TPM for loading private keys (e.g. in TSS2 format).

##### 🧰 Bug fixes 🧰

-   `exporterhelper`: Add validation error for batch config if min_size is greater than queue_size. ([#&#8203;12948](open-telemetry/opentelemetry-collector#12948))

-   `telemetry`: Allocate less memory per component when OTLP exporting of logs is disabled ([#&#8203;13014](open-telemetry/opentelemetry-collector#13014))

-   `confmap`: Use reflect.DeepEqual to avoid panic when confmap.enableMergeAppendOption feature gate is enabled. ([#&#8203;12932](open-telemetry/opentelemetry-collector#12932))

-   `internal telemetry`: Add resource attributes from telemetry.resource to the logger ([#&#8203;12582](open-telemetry/opentelemetry-collector#12582))
    Resource attributes from telemetry.resource were not added to the internal
    console logs.

    Now, they are added to the logger as part of the "resource" field.

-   `confighttp and configcompression`: Fix handling of `snappy` content-encoding in a backwards-compatible way ([#&#8203;10584](open-telemetry/opentelemetry-collector#10584), [#&#8203;12825](open-telemetry/opentelemetry-collector#12825))
    The collector used the Snappy compression type of "framed" to handle the HTTP
    content-encoding "snappy".  However, this encoding is typically used to indicate
    the "block" compression variant of "snappy".  This change allows the collector to:
    -   When receiving a request with encoding 'snappy', the server endpoints will peek
        at the first bytes of the payload to determine if it is "framed" or "block" snappy,
        and will decompress accordingly.  This is a backwards-compatible change.
    If the feature-gate "confighttp.framedSnappy" is enabled, you'll see new behavior for both client and server:
    -   Client compression type "snappy" will now compress to the "block" variant of snappy
        instead of "framed". Client compression type "x-snappy-framed" will now compress to the "framed" variant of snappy.
    -   Servers will accept both "snappy" and "x-snappy-framed" as valid content-encodings.

-   `tlsconfig`: Disable TPM tests on MacOS/Darwin ([#&#8203;12964](open-telemetry/opentelemetry-collector#12964))

<!-- previous-version -->

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yNjMuMSIsInVwZGF0ZWRJblZlciI6IjM5LjI2My4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

Reviewed-on: https://gitea.t000-n.de/t.behrendt/tracebasedlogsampler/pulls/13
Co-authored-by: Renovate Bot <[email protected]>
Co-committed-by: Renovate Bot <[email protected]>
@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented May 28, 2025

There has been no response to the issue on the Zap repo so far, but I had an idea on how to share sampling counters using reflect; see this draft PR: #13107. It's not super elegant and could break if a new Zap version changes the sampler internals, but it would solve the memory issue in cases where logs are exported through OTLP. If you have context, please take a look and tell me if you think the hack is worth it and I should submit it for review.

github-merge-queue bot pushed a commit that referenced this issue Jun 11, 2025
#### Context

PR #12617, which implemented the injection of component-identifying
attributes into the `zap.Logger` provided to components, introduced
significant additional memory use when the Collector's pipelines contain
many components (#13014). This was because we would call
`zapcore.NewSamplerWithOptions` to wrap the specialized logger core of
each Collector component, which allocates half a megabyte's worth of
sampling counters.

This problem was mitigated in #13015 by moving the sampling layer to a
different location in the logger core hierarchy. This meant that
Collector users that do not export their logs through OTLP and only use
stdout-based logs no longer saw the memory increase.

#### Description

This PR aims to provide a better solution to this issue, by using the
`reflect` library to clone zap's sampler core and set a new inner core,
while reusing the counter allocation.

(This may also be "more correct" from a sampling point of view, ie. we
only have one global instance of the counters instead of one for console
logs and one for each component's OTLP-exported logs, but I'm not sure
if anyone noticed the difference anyway).

#### Link to tracking issue
Fixes #13014

#### Testing
A new test was added which checks that the log counters are shared
between two sampler cores with different attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p1 High
Projects
None yet
4 participants