-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add retry dropped item metrics and an exhausted retry error marker for exporter helper retries #13957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13957 +/- ##
==========================================
+ Coverage 92.13% 92.15% +0.01%
==========================================
Files 666 666
Lines 41438 41515 +77
==========================================
+ Hits 38180 38257 +77
Misses 2218 2218
Partials 1040 1040 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
869d3f8 to
ddfd2b6
Compare
|
@open-telemetry/collector-approvers can you take a look? |
jmacd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking feedback (cc @jade-guiton-dd @axw).
Question 1: The universal telemetry RFC describes the use of an attribute otelcol.component.outcome=failure to indicate when an export fails. Why would we need a separate counter to indicate when retry fails?
Question 2: If the exporterhelper is configured with wait_for_result=true then it's difficult to call these failures "drops". Wouldn't the same sort of "drop" happen if the queue is configured (without wait_for_result=true) but also without the retry processor?
I guess these questions lead me to suspect that it's the queue (not the retry sender) that should count drops which are requests that fail and have no upstream response returned because wait_for_result=false. Otherwise, failures are failures, I see no reason to count them in a new way.
|
Thanks for your always valuable feedback @jmacd :D
The RFC attribute only tells you whether a single export span ended in success or failure. It doesn’t say why it failed or how many items were lost. Before this change, the obsreport sender only knew that By having the retry sender wrap the terminal error with
The queue already accounts for the situations it is responsible for ( In the configuration you mentioned (queue enabled, So the queue doesn’t have enough context to produce a “retry exhausted” metric, while the retry sender does. That’s why the new counters live alongside the retry logic instead of inside the queue. |
|
(For the record, the type of failure that occurred is already visible in logs. Of course, that doesn't mean we can't also surface it as metrics.) |
3ca8666 to
fbd3281
Compare
…r exporter helper retries Signed-off-by: Israel Blancas <[email protected]>
…etry#13957 Signed-off-by: Jayson Cena <[email protected]>
Description
Link to tracking issue
Fixes #13956