otel: Improve OpenTelemetry trace instrumentation#7224
otel: Improve OpenTelemetry trace instrumentation#7224mohammed90 wants to merge 24 commits intomasterfrom
Conversation
Co-authored-by: Mohammed Al Sahaf <[email protected]>
Signed-off-by: Mohammed Al Sahaf <[email protected]>
Signed-off-by: Mohammed Al Sahaf <[email protected]>
Signed-off-by: Mohammed Al Sahaf <[email protected]>
Signed-off-by: Mohammed Al Sahaf <[email protected]>
Signed-off-by: Mohammed Al Sahaf <[email protected]>
fahrradflucht
left a comment
There was a problem hiding this comment.
Re the changes in reverseproxy:
I tested this, and it indeed fixes #7223 . I wasn't fully able to test the h2c / h3 case - probably I'm just holding Caddy wrong, which is unfortunate because I had doubts it is correctly working with the current implementation.
The default transport case is easy to try by doing something like:
docker run --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-e COLLECTOR_OTLP_ENABLED=true \
-p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 \
jaegertracing/all-in-one:1.37and then running Caddy with a Caddyfile like this:
:8080 {
tracing {
span server-span
}
handle_path /* {
reverse_proxy {
to https://example.com
header_up Host {upstream_hostport}
}
tracing {
span handle-path
}
}
}After making a request, you'll see traces at http://localhost:16686/ and can observe that an HTTP GET span is created and most importantly that it has a span.kind set to client - which is important to fix #7223 - this span is what is missing on master.
Re the rest of the changes: I'm less familiar with the problem this is solving. It looks like it addresses #7139 and duplicates #7220 - maybe @arpansaha13 can take a look as well and chime in if this approach looks sound?
Signed-off-by: Mohammed Al Sahaf <[email protected]>
| return err | ||
| switch d.Val() { | ||
| case "server_timing": | ||
| if d.NextArg() { |
There was a problem hiding this comment.
After reading up a bit on testing the feature myself, I think this condition isn't right. With this if the documented Caddyfile syntax doesn't work, because no argument is present after server_timing. The header is currently only set if there is an arbitrary garbage arg.
| if module, ok := h.Transport.(caddy.Module); ok && module.CaddyModule().ID.Name() == "fastcgi" && h.RequestBuffers == 0 { | ||
| h.RequestBuffers = 4096 | ||
| } | ||
| h.Transport = otelhttp.NewTransport(h.Transport) |
There was a problem hiding this comment.
This actually broke the instrumentation in my example, because it now only works if h.TransportRaw != nil which isn't the case when using the default transport.
|
|
||
| // Add the server-timing header so clients can make the connection | ||
| if ot.injectServerTimingHeader { | ||
| w.Header().Set("server-timing", fmt.Sprintf("traceparent;desc=\"00-%s-%s-%s\"", traceID, spanID, spanCtx.TraceFlags().String())) |
There was a problem hiding this comment.
I think this should be Add instead of Set. Otherwise, we'll clobber unrelated metrics in server-timings.
| if module, ok := h.Transport.(caddy.Module); ok && module.CaddyModule().ID.Name() == "fastcgi" && h.RequestBuffers == 0 { | ||
| h.RequestBuffers = 4096 | ||
| } | ||
| h.Transport = otelhttp.NewTransport(h.Transport) |
There was a problem hiding this comment.
I think if we reliably instrument all transports, we can stop mutating the inbound headers using ot.propagators.Inject in tracing.go as the transport will inject into outbound requests. - However as per my understanding leaving it will probably not hurt either.
Continuation of #6417 w/ update atop master and fix lint.
Sorry, @cedricziel, I screwed up your PR when rebasing.
Original comment from on PR 6417:
CC/ @fahrradflucht
Assistance Disclosure
"No AI was used."
Closes #6417