-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Component(s)
connector/spanmetrics
Is your feature request related to a problem? Please describe.
Hello! I have been implementing the spanmetrics connector for Observe's RED metrics use case, and I have accumulated some feedback on this component. First of all, this connector is extremely useful and I appreciate all of your efforts to write and maintain it! During implementation, I hit a few hiccups with the attributes and I wanted to share some ideas:
-
The definition of the dimensions accepts both resource attribute names and span attribute names in the same list. This means that if we see Span1 with attribute
test=123
in a resource with notest
attribute, and Span2 with no attributes in a resource withtest=123
, then Span1 and Span2 would be aggregated together. This is admittedly a rare occurrence in my use case, but I think it contributes to my second challenge. -
The output schema of the metrics is such that every dimension is added to the datapoint attributes, and the resource attributes to which the datapoints belong is a full copy of an arbitrary input span's resource attributes. For example, if we aggregate on the dimension
k8s.namespace.name
, assuming the spans' resource also hask8s.pod.name
, then the output metrics' resource will have an arbitraryk8s.pod.name
. Since this connector performs an aggregation, I think adding resource attributes outside of the configured dimensions is inherently arbitrary and can lead to confusion. This is further complicated by my point above since it's not clear which of the dimensions correspond to resource attributes. -
The
status.code
attribute is added to all output datapoints when the span status code is set. However, I think this fits the semantic convention forotel.status_code
(https://opentelemetry.io/docs/specs/semconv/registry/attributes/otel/#otel-status-code) and it would be more consistent to use that naming.
Here is a full example pipeline to illustrate my points above:
connectors:
spanmetrics:
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
# This connector implicitly adds: service.name, span.name, span.kind, and status.code (which we rename to otel.status_code)
# https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/connector.go#L528-L540
# we additional break down by dimensions below
dimensions:
- name: service.namespace
- name: service.version
- name: deployment.environment
- name: k8s.pod.name
- name: k8s.namespace.name
- name: otel.status_description
histogram:
exponential:
max_size: 100
processors:
transform/add_description_attribute:
error_mode: ignore
trace_statements:
# Needed because `spanmetrics` connector can only operate on attributes or resource attributes.
- set(span.attributes["otel.status_description"], span.status.message) where span.status.message != ""
# The spanmetrics connector puts all dimensions as attributes on the datapoint, and copies the resource attributes from an arbitrary span's resource. This cleans that up as well as handling any other renaming.
transform/fix_red_metrics_resource_attributes:
error_mode: ignore
metric_statements:
- keep_matching_keys(resource.attributes, "^(service.name|service.namespace|service.version|deployment.environment|k8s.pod.name|k8s.namespace.name)")
- delete_matching_keys(datapoint.attributes, "^(service.name|service.namespace|service.version|deployment.environment|k8s.pod.name|k8s.namespace.name)")
- set(datapoint.attributes["otel.status_code"], "OK") where datapoint.attributes["status.code"] == "STATUS_CODE_OK"
- set(datapoint.attributes["otel.status_code"], "ERROR") where datapoint.attributes["status.code"] == "STATUS_CODE_ERROR"
- delete_key(datapoint.attributes, "status.code")
service:
pipelines:
traces/spanmetrics:
receivers: [otlp]
processors: [transform/add_description_attribute]
exporters: [spanmetrics]
metrics/spanmetrics:
receivers: [spanmetrics]
processors: [transform/fix_red_metrics_resource_attributes]
exporters: [otlphttp]
My proposal is to update the connector so the config would instead look like this:
connectors:
spanmetrics:
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
dimensions:
- name: service.namespace
from: resource
- name: service.version
from: resource
- name: deployment.environment
from: resource
- name: k8s.pod.name
from: resource
- name: k8s.namespace.name
from: resource
- name: otel.status_description
from: span
histogram:
exponential:
max_size: 100
and the processors would no longer be necessary.
I am happy to make this change and send a pull request, but I wanted to open a discussion first to see how likely it is to get such a PR accepted before spending the time. I look forward to hearing any feedback, thank you!
Describe the solution you'd like
See above; I'm happy to contribute this change if the maintainers agree with the proposed direction.
Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.