-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Handle ping failures more gracefully #7422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I just finished debugging a situation where in AWS an OpenSearch cluster returned a HTTP 403 response with {"Message":"User: anonymous is not authorized to perform: es:ESHttpGet because no resource-based policy allows the es:ESHttpGet action"} in the body. Jaeger reacted to that with: panic: runtime error: index out of range [0] with length 0 goroutine 1 [running]: github.com/jaegertracing/jaeger/internal/storage/elasticsearch/config.NewClient({0x2d430f8, 0x4816cc0}, 0x40003f3008, 0x40002ee000, {0x2d56ba0, 0x40003e0080}) github.com/jaegertracing/jaeger/internal/storage/elasticsearch/config/config.go:353 +0xcf8 github.com/jaegertracing/jaeger/internal/storage/v1/elasticsearch.NewFactoryBase({_, _}, {{0x40005b0f10, 0x1, 0x1}, {0x4816cc0, 0x0, 0x0}, {{{{...}, {...}, ...}, ...}, ...}, ...}, ...) github.com/jaegertracing/jaeger/internal/storage/v1/elasticsearch/factory.go:79 +0x1fc github.com/jaegertracing/jaeger/internal/storage/v2/elasticsearch.NewFactory({_, _}, {{0x40005b0f10, 0x1, 0x1}, {0x4816cc0, 0x0, 0x0}, {{{{...}, {...}, ...}, ...}, ...}, ...}, ...) github.com/jaegertracing/jaeger/internal/storage/v2/elasticsearch/factory.go:42 +0x194 github.com/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerstorage.(*storageExt).Start(0x4000034ae0, {0x2d430f8, 0x4816cc0}, {0x2d1d060?, 0x4000294dc0?}) github.com/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerstorage/extension.go:193 +0x428 go.opentelemetry.io/collector/service/extensions.(*Extensions).Start(0x40001bbe80, {0x2d430f8, 0x4816cc0}, {0x2d1d060, 0x4000294dc0}) go.opentelemetry.io/collector/[email protected]/extensions/extensions.go:52 +0x244 go.opentelemetry.io/collector/service.(*Service).Start(0x400012c3c0, {0x2d430f8, 0x4816cc0}) go.opentelemetry.io/collector/[email protected]/service.go:281 +0x1b0 go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents(0x4000156840, {0x2d430f8, 0x4816cc0}) go.opentelemetry.io/collector/[email protected]/collector.go:242 +0x898 go.opentelemetry.io/collector/otelcol.(*Collector).Run(0x4000156840, {0x2d430f8, 0x4816cc0}) go.opentelemetry.io/collector/[email protected]/collector.go:312 +0x3c go.opentelemetry.io/collector/otelcol.NewCommand.func1(0x40001f6f08, {0x263b5de?, 0x6?, 0x0?}) go.opentelemetry.io/collector/[email protected]/command.go:39 +0x80 github.com/jaegertracing/jaeger/cmd/jaeger/internal.checkConfigAndRun(0x40001f6f08, {0x40002c40d0, 0x0, 0x1}, 0x40004d1c28, 0x4000149068) github.com/jaegertracing/jaeger/cmd/jaeger/internal/command.go:84 +0x1a8 github.com/jaegertracing/jaeger/cmd/jaeger/internal.Command.func1(0x400015ef00?, {0x40002c40d0?, 0x7?, 0x26370cb?}) github.com/jaegertracing/jaeger/cmd/jaeger/internal/command.go:59 +0x44 github.com/spf13/cobra.(*Command).execute(0x40001f6f08, {0x400004c1d0, 0x1, 0x1}) github.com/spf13/[email protected]/command.go:1015 +0x844 github.com/spf13/cobra.(*Command).ExecuteC(0x40001f6f08) github.com/spf13/[email protected]/command.go:1148 +0x384 github.com/spf13/cobra.(*Command).Execute(0x400014d200?) github.com/spf13/[email protected]/command.go:1071 +0x1c main.main() github.com/jaegertracing/jaeger/cmd/jaeger/main.go:29 +0xa0 because of a combination of problems: * The ping method doesn't report non-2xx status codes as error * It also doesn't report response deserialization problems as errors * This code didn't check the status code * The code also accessed the version number assuming it wasn't empty Signed-off-by: Jakub Stasiak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test that illustrates the issue
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7422 +/- ##
==========================================
- Coverage 96.49% 96.45% -0.04%
==========================================
Files 377 377
Lines 23071 23077 +6
==========================================
- Hits 22263 22260 -3
- Misses 611 617 +6
- Partials 197 200 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add tests
I just finished debugging a situation where in AWS an OpenSearch cluster returned a HTTP 403 response with
in the body.
Jaeger reacted to that with:
because of a combination of problems:
Which problem is this PR solving?
N/A
How was this change tested?
Manually
Checklist
jaeger
:make lint test
kind of: linting passed, some tests failed for port-binding reasons, I don't think related to my changesjaeger-ui
:npm run lint
andnpm run test