-
Notifications
You must be signed in to change notification settings - Fork 156
Library to support Telemetry.Metrics as OpenTelemetry metrics #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Forgot I needed to make So may just add a warning in the docs that Scope is going to be wrong for all your |
josevalim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
|
Can I help you guys somehow? Have to submit metrics to Google Metrics for alerting and, since our metrics are not mission critical, we can try to pilot this PR in production. |
|
@AndrewDryga yes, definitely! Are you on slack? I hit a bug in metrics that has kept me from having time to spend on docs but could help you on slack to get going. |
|
WHOOPS. I started a job shortly after this and forgot about this PR, wow. Trying to resurrect it. |
…etrics.ex Co-authored-by: José Valim <[email protected]>
52c07c4 to
748990f
Compare
|
Marked as ready for review tho I don't remember if it really is. But based on the tests it has I guess it works, hehe. |
bryannaegele
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the deps but let's get this in the hands of folks
|
I'm very interested in getting this merged. How can I help? |
Co-authored-by: Bryan Naegele <[email protected]>
|
@ethangunderson tried it by chance? That'd make it easier to feel confident merging. |
|
@tsloughter I haven't yet, but I have the perfect app for it. I'll make time for it this week and report back. |
|
Alright, a bit late, but I tried this out yesterday. I have an existing application that uses the OTLP collectors from a Datadog agent. Traces work correctly, but I can't get metrics working. If I swap out the reader for If the application does not have |
|
@ethangunderson are you sending the metrics to the collector which then sends to DataDog? Can you verify if they are getting to the cluster by adding the debug exporter to the collector metrics pipeline. |
|
@tsloughter I work with @ethangunderson and just got a little test using this working. We're actually sending metrics directly to a Datadog Agent over grpc on port 4317 and I got it to work fine, just a few things that tripped me up along the way. I'll provide the three main things I ran into (I hacked a couple of them into working just to prove this out as a PoC). First, here's the config we're using for reference config :opentelemetry_experimental,
readers: [
%{
module: :otel_metric_reader,
config: %{
exporter:
{:otel_exporter_metrics_otlp,
%{
endpoint: System.get_env("OTLP_ENDPOINT"),
protocol: :grpc,
compression: :gzip
}}
}
}
]We set up a little test with a single defmodule Vantage.Otel.Metrics do
@moduledoc """
Experimental usage of OtelTelemetryMetrics for a PoC
"""
use Supervisor
import Telemetry.Metrics
def start_link(opts) do
Supervisor.start_link(__MODULE__, opts, name: __MODULE__)
end
@impl Supervisor
def init(opts) do
children = [
{OtelTelemetryMetrics, metrics: Keyword.get(opts, :metrics, metrics())}
]
Supervisor.init(children, strategy: :one_for_one)
end
defp metrics do
[
counter("vantage.test.metric.count",
# unit: :count, ====> Needed to set this explicitly to avoid an error in `otel_otlp_metrics.erl`
event_name: [:vantage, :test, :metric],
description: "just a test",
measurement: 1
)
]
end
end1) ArgumentError on
|
|
I was able to get this working, though I had to make some changes 😅 Firstly, I'm currently using the experimental code from GitHub because not everything is released to hex {:opentelemetry_experimental,
git: "https://github.com/cheerfulstoic/opentelemetry-erlang.git",
sparse: "apps/opentelemetry_experimental",
override: true},
{:opentelemetry_api_experimental,
git: "https://github.com/cheerfulstoic/opentelemetry-erlang.git",
sparse: "apps/opentelemetry_api_experimental",
override: true},But I also needed to get the "metric" differently than how the code in this PR works. Instead of calling defmodule OtelTelemetryMetrics do
@moduledoc """
BASED ON THIS PR:
https://github.com/open-telemetry/opentelemetry-erlang-contrib/pull/303
If we can get this generally working, it would be nice to contribute the changes back.
`OtelTelemetryMetrics.start_link/1` creates OpenTelemetry Instruments for
`Telemetry.Metric` metrics and records to them when their corresponding
events are triggered.
metrics = [
last_value("vm.memory.binary", unit: :byte),
counter("vm.memory.total"),
counter("db.query.duration", tags: [:table, :operation]),
summary("http.request.response_time",
tag_values: fn
%{foo: :bar} -> %{bar: :baz}
end,
tags: [:bar],
drop: fn metadata ->
metadata[:boom] == :pow
end
),
sum("telemetry.event_size.metadata",
measurement: &__MODULE__.metadata_measurement/2
),
distribution("phoenix.endpoint.stop.duration",
measurement: &__MODULE__.measurement/1
)
]
{:ok, _} = OtelTelemetryMetrics.start_link([metrics: metrics])
Then either in your Application code or a dependency execute `telemetry`
events conataining the measurements. For example, an event that will result
in the metrics `vm.memory.total` and `vm.memory.binary` being recorded to:
:telemetry.execute([:vm, :memory], %{binary: 100, total: 200}, %{})
OpenTelemetry does not support a `summary` type metric, the `summary`
`http.request.response_time` is recorded as a single bucket histogram.
In `Telemetry.Metrics` the `counter` type refers to counting the number of
times an event is triggered, this is represented as a `sum` in OpenTelemetry
and when recording the value is sent as a `1` every time.
Metrics of type `last_value` are ignored because `last_value` is not yet an
aggregation supported on synchronous instruments in Erlang/Elixir
OpenTelemetry. When it is added to the SDK this library will be updated to
no longer ignore metrics of this type.
"""
require Logger
use GenServer
@doc """
"""
def start_link(options) do
GenServer.start_link(__MODULE__, options, name: __MODULE__)
end
@impl true
def init(options) do
Process.flag(:trap_exit, true)
meter = options[:meter] || get_meter()
metrics = options[:metrics] || []
handler_ids = create_instruments_and_attach(meter, metrics)
{:ok, %{handler_ids: handler_ids}}
end
@impl true
def terminate(_, %{handler_ids: handler_ids}) do
Enum.each(handler_ids, fn id -> :telemetry.detach(id) end)
end
defp create_instruments_and_attach(meter, metrics) do
metrics
|> Enum.group_by(& &1.event_name)
|> Enum.map(fn {event_name, metrics} ->
metrics_by_measurement = Enum.group_by(metrics, &List.last(&1.name))
for metric <- metrics do
create_instrument(metric, meter, %{
unit: unit(metric.unit),
description: format_description(metric)
})
end
handler_id = {__MODULE__, event_name, self()}
:ok =
:telemetry.attach(
handler_id,
event_name,
&__MODULE__.handle_event/4,
%{metrics_by_measurement: metrics_by_measurement}
)
handler_id
end)
end
defp create_instrument(%Telemetry.Metrics.Counter{} = metric, meter, opts) do
:otel_counter.create(meter, format_name(metric), opts)
end
# a summary is represented as an explicit histogram with a single bucket
defp create_instrument(%Telemetry.Metrics.Summary{} = metric, meter, opts) do
:otel_histogram.create(
meter,
format_name(metric),
Map.put(opts, :advisory_params, %{explicit_bucket_boundaries: []})
)
end
defp create_instrument(%Telemetry.Metrics.Distribution{} = metric, meter, opts) do
:otel_histogram.create(meter, format_name(metric), opts)
end
defp create_instrument(%Telemetry.Metrics.Sum{} = metric, meter, opts) do
:otel_counter.create(meter, format_name(metric), opts)
end
# waiting on
defp create_instrument(%Telemetry.Metrics.LastValue{} = metric, _meter, _) do
Logger.info(
"Ignoring metric #{inspect(metric.name)} because LastValue aggregation is not supported in this version of OpenTelemetry Elixir"
)
nil
end
defp unit(:unit), do: "1"
defp unit(unit), do: "#{unit}"
defp format_description(metric) do
metric.description || "#{format_name(metric)}"
end
defp format_name(metric) do
metric.name
|> Enum.join(".")
|> String.to_atom()
end
def handle_event(event_name, measurements, metadata, %{
metrics_by_measurement: metrics_by_measurement
}) do
for {measurement, metrics} <- metrics_by_measurement,
metric <- metrics do
if value = keep?(metric, metadata) && extract_measurement(metric, measurements, metadata) do
ctx = OpenTelemetry.Ctx.get_current()
tags = extract_tags(metric, metadata)
meter = get_meter()
name =
(event_name ++ [measurement])
|> Enum.map_join(".", &to_string/1)
|> String.to_atom()
:ok = :otel_meter.record(ctx, meter, name, value, tags)
end
end
end
defp get_meter do
:opentelemetry_experimental.get_meter(:opentelemetry.get_application_scope(__MODULE__))
end
defp keep?(%{keep: nil}, _metadata), do: true
defp keep?(%{keep: keep}, metadata), do: keep.(metadata)
defp extract_measurement(%Telemetry.Metrics.Counter{}, _measurements, _metadata) do
1
end
defp extract_measurement(metric, measurements, metadata) do
case metric.measurement do
nil ->
nil
fun when is_function(fun, 1) ->
fun.(measurements)
fun when is_function(fun, 2) ->
fun.(measurements, metadata)
key ->
measurements[key] || 1
end
end
defp extract_tags(metric, metadata) do
tag_values = metric.tag_values.(metadata)
Map.take(tag_values, metric.tags)
end
end |
|
Also important, here is my configuration! It took me quite a while to figure it out, so hopefully it's useful! The "cumulative" settings under config :opentelemetry,
span_processor: :batch,
traces_exporter: :otlp
config :opentelemetry_exporter,
otlp_protocol: :http_protobuf,
otlp_endpoint: opentelemetry_endpoint
config :opentelemetry_experimental,
readers: [
%{
module: :otel_metric_reader,
config: %{
export_interval_ms: 1_000,
exporter: {
:otel_exporter_metrics_otlp,
%{endpoints: [opentelemetry_endpoint], protocol: :http_protobuf, ssl_options: []}
},
# For Prometheus
default_temporality_mapping: %{
counter: :temporality_cumulative,
observable_counter: :temporality_cumulative,
updown_counter: :temporality_cumulative,
observable_updowncounter: :temporality_cumulative,
histogram: :temporality_cumulative,
observable_gauge: :temporality_cumulative
}
}
}
] |
|
@cheerfulstoic thanks! The The configuration can certainly be improved and hopefully will as we adopt https://github.com/open-telemetry/opentelemetry-configuration/ -- I plan to support the otel config format as Erlang/Elixir terms first (in our usual configuration files) and then it'll support converting from json to internal configuration. Verifying configuration will be done through the tool that is in |
|
👍 Cool Yeah, I was using the |
|
I was putting these change into a couple of different project and I didn't want to copy/paste, so I extracted it into a library. I don't want the library to be permanent and I expect that whatever changes work should eventually end up here again. Just doing this in the meantime! |
Not ready because still using local changes in the deps but wanted to open it up for review while those changes are being merged and released.