Skip to content

[consumererror] Add OTLP-centric error type #13042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

evan-bradley
Copy link
Contributor

Description

Continuation of #11085.

Link to tracking issue

Fixes #7047

@evan-bradley evan-bradley requested a review from a team as a code owner May 15, 2025 21:32
Copy link

codecov bot commented May 15, 2025

Codecov Report

Attention: Patch coverage is 91.39785% with 8 lines in your changes missing coverage. Please review.

Project coverage is 91.27%. Comparing base (e1a9093) to head (15e3d3b).

Files with missing lines Patch % Lines
...sumererror/internal/statusconversion/conversion.go 80.00% 8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (91.39%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13042      +/-   ##
==========================================
- Coverage   91.27%   91.27%   -0.01%     
==========================================
  Files         510      512       +2     
  Lines       28758    28848      +90     
==========================================
+ Hits        26250    26332      +82     
- Misses       1993     2001       +8     
  Partials      515      515              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@evan-bradley
Copy link
Contributor Author

I'll look at improving the code coverage tomorrow. In the meantime, this should be in a pretty good state.

@evan-bradley
Copy link
Contributor Author

The remaining functions missing test coverage are the status code conversion functions, which are pretty direct. I don't think tests are very helpful since the functions are pretty direct mappings. The only thing I can think of that would meaningfully improve coverage is to store the mappings in a map object as opposed to in a switch statement, but feels like a slightly worse implementation.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so excited to see this revived

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seemed to be consensus on the last iteration on this implementation, I think what we need now is to test this in real life, thus I am approving this so we can move forward

@mx-psi
Copy link
Member

mx-psi commented May 21, 2025

Since this was specially controversial last time, I suggest we wait either until we have more approvals (I suggest 4) or some time has passed (I would suggest Friday next week).

cc @open-telemetry/collector-approvers

Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, liked the idea

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good to see this moving forward. This looks the way I would expect it to look after reviewing earlier feedback from @bogdandrutu.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a big problem we identified in the past, which is that the default behavior of the errors in the collector pipelines is that they are retryable. It seems that this PR changes that, which I 1000% support, but we need to make sure we document this change and analyze the impact of that.

@evan-bradley
Copy link
Contributor Author

There is a big problem we identified in the past, which is that the default behavior of the errors in the collector pipelines is that they are retryable. It seems that this PR changes that, which I 1000% support, but we need to make sure we document this change and analyze the impact of that.

Agreed. That will come in a follow-up once we start to use this, I will make sure we proceed discerningly here.

// See https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md for more details.
//
// If a gRPC code cannot be derived from these three sources then INTERNAL is returned.
func (e *Error) OTLPGRPCStatus() *status.Status {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually change the signature to something like:

func ToGRPCStatus(e error) *status.Status;

This way:

  • Can handle the "error.As" part.
  • Can handle extra details like retry-after you proposed with the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Receivers will find a function like this very ergonomic if they're only concerned about transmitting the original status code, and we can always duplicate the functionality in a method later. Added.


// NewOTLPGRPCError records a gRPC status code that was received from a server
// during data submission.
func NewOTLPGRPCError(origErr error, status *status.Status) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an "original" error in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, you can see an example from the OTLP exporter here.

I'd like to continue to wrap downstream errors even if calls to NewOTLPGRPCError will generally be made in exporters. We could call status.FromError in this function, but then we end up in the awkward case where the code in the *status.Status object is okay and there's no easy way in the exporter to tell it wasn't an error. As a result I think the best we can do is just accept both, even if it makes the function signature slightly awkward.

There's some symmetry with the function signature for NewOTLPHTTPError, which I think makes up for it a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate how to expose exporterhelper.NewThrottleRetry in the consumererror
6 participants