Skip to content

[exporter/otlphttp] 400 responses should not be retried even when retry_on_failure is enabled #14174

@thewillyhuman

Description

@thewillyhuman

Component(s)

exporter/otlphttp

What happened?

Describe the bug

According to the OpenTelemetry specification, only a specific set of response codes should be considered retryable (see: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#retryable-response-codes). However, when the retry_on_failure configuration option is enabled (enabled=true, which is the default), the collector appears to retry all 400 errors as well.

We observe this behavior on collectors that write to our Mimir via exporter/otlphttp: Mimir responds with 400 errors due to “metric too old,” and these non-retryable errors are stored in the local queues -which should not happen-. This leads to queue growth and, probably, unnecessary retries.

I would expect the collector to automatically discard non-retryable errors, regardless of the retry_on_failure setting.

Steps to reproduce

  1. Configure an exporter with retry_on_failure.enabled = true (default).
  2. Send metrics that Mimir rejects with a 400 “metric too old” error.
  3. Observe that the collector enqueues and retries these requests instead of discarding them.

What did you expect to see?

Non-retryable errors (e.g., 400 for invalid or too-old data) should be dropped immediately and not queued or retried.

What did you see instead?

The collector treats 400 errors as retryable (presumably), stores them in the local queue, and retries them indefinitely, contrary to the OpenTelemetry specification.

Collector version

v0.137.0

Environment information

Environment

OS: Alma Linux 9

OpenTelemetry Collector configuration

Log output

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions