Skip to content

[connector/spanmetrics] improvements to config and schema of output RED metrics #42103

@obs-gh-mattcotter

Description

@obs-gh-mattcotter

Component(s)

connector/spanmetrics

Is your feature request related to a problem? Please describe.

Hello! I have been implementing the spanmetrics connector for Observe's RED metrics use case, and I have accumulated some feedback on this component. First of all, this connector is extremely useful and I appreciate all of your efforts to write and maintain it! During implementation, I hit a few hiccups with the attributes and I wanted to share some ideas:

  1. The definition of the dimensions accepts both resource attribute names and span attribute names in the same list. This means that if we see Span1 with attribute test=123 in a resource with no test attribute, and Span2 with no attributes in a resource with test=123, then Span1 and Span2 would be aggregated together. This is admittedly a rare occurrence in my use case, but I think it contributes to my second challenge.

  2. The output schema of the metrics is such that every dimension is added to the datapoint attributes, and the resource attributes to which the datapoints belong is a full copy of an arbitrary input span's resource attributes. For example, if we aggregate on the dimension k8s.namespace.name, assuming the spans' resource also has k8s.pod.name, then the output metrics' resource will have an arbitrary k8s.pod.name. Since this connector performs an aggregation, I think adding resource attributes outside of the configured dimensions is inherently arbitrary and can lead to confusion. This is further complicated by my point above since it's not clear which of the dimensions correspond to resource attributes.

  3. The status.code attribute is added to all output datapoints when the span status code is set. However, I think this fits the semantic convention for otel.status_code (https://opentelemetry.io/docs/specs/semconv/registry/attributes/otel/#otel-status-code) and it would be more consistent to use that naming.

Here is a full example pipeline to illustrate my points above:

connectors:
  spanmetrics:
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    # This connector implicitly adds: service.name, span.name, span.kind, and status.code (which we rename to otel.status_code)
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/connector.go#L528-L540
    # we additional break down by dimensions below
    dimensions:
      - name: service.namespace
      - name: service.version
      - name: deployment.environment
      - name: k8s.pod.name
      - name: k8s.namespace.name
      - name: otel.status_description
    histogram:
      exponential:
        max_size: 100

processors:
  transform/add_description_attribute:
    error_mode: ignore
    trace_statements:
      # Needed because `spanmetrics` connector can only operate on attributes or resource attributes.
      - set(span.attributes["otel.status_description"], span.status.message) where span.status.message != ""

  # The spanmetrics connector puts all dimensions as attributes on the datapoint, and copies the resource attributes from an arbitrary span's resource. This cleans that up as well as handling any other renaming.
  transform/fix_red_metrics_resource_attributes:
    error_mode: ignore
    metric_statements:
      - keep_matching_keys(resource.attributes, "^(service.name|service.namespace|service.version|deployment.environment|k8s.pod.name|k8s.namespace.name)")
      - delete_matching_keys(datapoint.attributes, "^(service.name|service.namespace|service.version|deployment.environment|k8s.pod.name|k8s.namespace.name)")
      - set(datapoint.attributes["otel.status_code"], "OK") where datapoint.attributes["status.code"] == "STATUS_CODE_OK"
      - set(datapoint.attributes["otel.status_code"], "ERROR") where datapoint.attributes["status.code"] == "STATUS_CODE_ERROR"
      - delete_key(datapoint.attributes, "status.code")

service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      processors: [transform/add_description_attribute]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      processors: [transform/fix_red_metrics_resource_attributes]
      exporters: [otlphttp]

My proposal is to update the connector so the config would instead look like this:

connectors:
  spanmetrics:
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    dimensions:
      - name: service.namespace
        from: resource
      - name: service.version
        from: resource
      - name: deployment.environment
        from: resource
      - name: k8s.pod.name
        from: resource
      - name: k8s.namespace.name
        from: resource
      - name: otel.status_description
        from: span
    histogram:
      exponential:
        max_size: 100

and the processors would no longer be necessary.

I am happy to make this change and send a pull request, but I wanted to open a discussion first to see how likely it is to get such a PR accepted before spending the time. I look forward to hearing any feedback, thank you!

Describe the solution you'd like

See above; I'm happy to contribute this change if the maintainers agree with the proposed direction.

Describe alternatives you've considered

No response

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions