-
Notifications
You must be signed in to change notification settings - Fork 47
Description
I recently noticed that the reporter throws away the measurement
field of Counter metrics. This can lead to some surprising outcomes.
As an illustration, consider the following metric definitions:
# Metrics to be passed to the TelemetryMetricsStatsd reporter.
metrics = [
Telemetry.Metrics.counter([:foo, :bar, :baz], tags: [:a, :b]),
Telemetry.Metrics.counter([:foo, :bar, :bat], tags: [:b, :c]),
]
Based on my understanding of the current implementation of the handle_event/4
function, the following code would trigger a metric to be published for both of the above metrics (one of which would have the incorrect tags and, I think, may actually cause the handler to detach):
:telemetry.execute([:foo, :bar, :baz], %{}, tags: %{a: 1, b: 2})
This can be avoid if you explicitly pass the event_name
option, or if you ensure that the default event_name
of your metric is unique (ie using the count
suffix on everything would do the trick).
I am struggling to quickly reproduce this in the tests for this project (without explicitly specifying event name), but in that case it's simply not publishing anything so it may be a separate problem. I can revisit and try to break things properly over the weekend.
But the offending code seems to be: https://github.com/arkgil/telemetry_metrics_statsd/blob/0f79e2af612caa834ac522da2f6209cd930f5967/lib/telemetry_metrics_statsd/event_handler.ex#L88-L91
The counter here will loop through every metric with the shared prefix and publish a payload without ensuring that the metric in question matches the measurement of the fired event. Contrast this with the Telemetry.Metrics.ConsoleReporter
module which will return nil
in its fetch_measurement/2
equivalent and not publish if the event_name
is the same but the measurement
is different: https://github.com/beam-telemetry/telemetry_metrics/blob/v0.3.0/lib/telemetry_metrics/console_reporter.ex#L98
This behavior can be mitigated by post-fixing all counter metrics with.count
, but that is not explicitly mentioned in the documentation (that I have been able to find).
If there's something going on that I'm just completely missing, I would be very happy to be wrong here 😄
PS. If you know how to get tests to work without explicitly specifying the event_name
option to given_counter
, it would help me reproduce this with a test faster. My initial discovery was from a test reporter module I wrote by copy-pasting the bones of this reporter. So while I am not certain this bug exists, I'm very suspicious.