SIGSEGV in `reply_append` #221

patrobinson · 2024-12-06T01:18:15Z

We've experienced numerous segfaults in production that point to this specific line of code
We can reliably reproduce this by simply triggering sidekiq to pause and unpause a queue, which causes it to receive a message from a pubsub channel.

queue = Sidekiq::Queue.new(ApplicationWorker::Queue::QUEUE_NAME)
queue.pause!
queue.unpause!

This happens on a handful of the dozens of containers we run.

I was able to get a coredump from one of the containers and here's the backtrace:

(gdb) bt full
#0  0x00007f8c6d51cebc in ?? () from buildkite-hiredis-segfault/eloquent_ride/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f8c6d4cdfb2 in raise () from buildkite-hiredis-segfault/eloquent_ride/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f8c6da9c8bf in ruby_default_signal (sig=<optimized out>) at signal.c:422
No locals.
#3  0x00007f8c6d8864b8 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x7f8c11fd9880, fmt=fmt@entry=0x7f8c6dcac4d5 "Segmentation fault at %p") at error.c:1069
        file = <optimized out>
        line = 92
#4  0x00007f8c6da9b84b in sigsegv (sig=11, info=0x7f8c11fd99b0, ctx=0x7f8c11fd9880) at signal.c:926
No locals.
#5  <signal handler called>
No symbol table info available.
#6  0x00007f8c48a792bd in reply_append (value=<REDACTED>, task=0x7f8c12223690) at hiredis_connection.c:143
        state = 0x7f8c0fb5d300
        task_index = <optimized out>
        state = <optimized out>
        task_index = <optimized out>
        parent = <optimized out>
        key = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
        rb_gc_guarded_ptr = <optimized out>
#7  reply_create_array (task=0x7f8c12223690, elements=<optimized out>) at hiredis_connection.c:212
        value = <REDACTED>

(gdb) p *((hiredis_reader_state_t *)(0x7f8c0fb5d300))
$1 = {stack = <REDACTED>, task_index = 0x0}

Somehow task_index is a null pointer, which shouldn't be possible given the code path?

The text was updated successfully, but these errors were encountered:

byroot · 2024-12-06T08:23:03Z

We can reliably reproduce this

Any chance you could create a repro script using bundler/inline, or even a Dockerfile? That would be very useful for me to investigate.

patrobinson · 2024-12-09T01:06:00Z

@byroot I don't think I will be able to, Sidekiq Pro is commercially licensed and there's multiple threads which seem to interact in some unknown way to trigger the bug.

I'm trying to replicate the issue with ASAN, as we did in #208

mperham · 2024-12-09T05:02:43Z

I can give @byroot Sidekiq Pro access, just have your gemfile use a local :path by using “gem unpack”.

byroot · 2024-12-12T08:24:19Z

@patrobinson any news?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGSEGV in `reply_append` #221

SIGSEGV in `reply_append` #221

patrobinson commented Dec 6, 2024 •

edited

Loading

byroot commented Dec 6, 2024

patrobinson commented Dec 9, 2024

mperham commented Dec 9, 2024

byroot commented Dec 12, 2024

SIGSEGV in reply_append #221

SIGSEGV in reply_append #221

Comments

patrobinson commented Dec 6, 2024 • edited Loading

byroot commented Dec 6, 2024

patrobinson commented Dec 9, 2024

mperham commented Dec 9, 2024

byroot commented Dec 12, 2024

SIGSEGV in `reply_append` #221

SIGSEGV in `reply_append` #221

patrobinson commented Dec 6, 2024 •

edited

Loading