Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in reply_append #221

Open
patrobinson opened this issue Dec 6, 2024 · 4 comments
Open

SIGSEGV in reply_append #221

patrobinson opened this issue Dec 6, 2024 · 4 comments

Comments

@patrobinson
Copy link

patrobinson commented Dec 6, 2024

We've experienced numerous segfaults in production that point to this specific line of code
We can reliably reproduce this by simply triggering sidekiq to pause and unpause a queue, which causes it to receive a message from a pubsub channel.

queue = Sidekiq::Queue.new(ApplicationWorker::Queue::QUEUE_NAME)
queue.pause!
queue.unpause!

This happens on a handful of the dozens of containers we run.

I was able to get a coredump from one of the containers and here's the backtrace:

(gdb) bt full
#0  0x00007f8c6d51cebc in ?? () from buildkite-hiredis-segfault/eloquent_ride/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f8c6d4cdfb2 in raise () from buildkite-hiredis-segfault/eloquent_ride/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f8c6da9c8bf in ruby_default_signal (sig=<optimized out>) at signal.c:422
No locals.
#3  0x00007f8c6d8864b8 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x7f8c11fd9880, fmt=fmt@entry=0x7f8c6dcac4d5 "Segmentation fault at %p") at error.c:1069
        file = <optimized out>
        line = 92
#4  0x00007f8c6da9b84b in sigsegv (sig=11, info=0x7f8c11fd99b0, ctx=0x7f8c11fd9880) at signal.c:926
No locals.
#5  <signal handler called>
No symbol table info available.
#6  0x00007f8c48a792bd in reply_append (value=<REDACTED>, task=0x7f8c12223690) at hiredis_connection.c:143
        state = 0x7f8c0fb5d300
        task_index = <optimized out>
        state = <optimized out>
        task_index = <optimized out>
        parent = <optimized out>
        key = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
#7  reply_create_array (task=0x7f8c12223690, elements=<optimized out>) at hiredis_connection.c:212
        value = <REDACTED>
(gdb) p *((hiredis_reader_state_t *)(0x7f8c0fb5d300))
$1 = {stack = <REDACTED>, task_index = 0x0}

Somehow task_index is a null pointer, which shouldn't be possible given the code path?

@byroot
Copy link
Member

byroot commented Dec 6, 2024

We can reliably reproduce this

Any chance you could create a repro script using bundler/inline, or even a Dockerfile? That would be very useful for me to investigate.

@patrobinson
Copy link
Author

@byroot I don't think I will be able to, Sidekiq Pro is commercially licensed and there's multiple threads which seem to interact in some unknown way to trigger the bug.

I'm trying to replicate the issue with ASAN, as we did in #208

@mperham
Copy link

mperham commented Dec 9, 2024

I can give @byroot Sidekiq Pro access, just have your gemfile use a local :path by using “gem unpack”.

@byroot
Copy link
Member

byroot commented Dec 12, 2024

@patrobinson any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants