Skip to content

OTel emitted metrics do not match behavior described in the docs #1929

@kevinawoo

Description

@kevinawoo

The behavior described via SDK docs and how the the OTel package was written is different. Community Slack Discussion

According to these comments, the behavior is that the these metrics are monotone increasing.

However the current OTel implementation supports explicitly supports negative numbers with the use of UpDownCounter

The Bug

The ask is to either fix the docs, or change OTel's emitted metric to be a Counter so that it matches how it's more commonly used.

The UpDownCounter type doesn't make contextual sense for the metrics emitted by the Temporal SDK.

See the OTel-Contrib-Packages section below on how other SDKs use UpDownCounters vs Counters.

Specifications

  • Version: v1.34.0

References

Temporal Go SDK

The introduction of OTel was introduced in #1336 (comment), with this comment left ambiguous about which counter type to use, however a test case was introduced to support negative numbers.

The summary of an internal discussion is that the package was built for general usage, without any limitations on the value.

Currently, all usages of the counter calls .Inc(1), however this may change, especially numbers > 1.

The behavior of the metrics was copied from the behavior of Tally.

Tally

uber-go/tally is ambiguous in which values is allowed, but it's implementation supports negative numbers:

Prometheus

Prometheus does not support negative numbers for counters and will panic if a negative number is provided. https://github.com/prometheus/client_golang/blob/9b83d994624f3cab82ec593133a598b3a27d0841/prometheus/counter.go#L126-L129

OTel Contrib Packages

OTel's Prometheus package implements counters as monotonic sums, which follows the spirit of Prometheus's implementation.

[A Counter] is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.

-OTel's Prometheus Design Doc #Counters

There's limited usage of UpDownCounters in all of the receiver packages. This metric is mostly commonly used to show the number of things in flight/active.

Note, I think Fluent's usage is actually a bug in how the code is auto generated.

23 occurrences (expand to see usage patterns)

Image

Where the Counter type is used as a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.

Temporal SDK usage of `counter` metric type

Image

Other SDK's use of `counter` type (164 occurrences)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions