Skip to content

Process crash when handling traps #139

@HoneyryderChuck

Description

@HoneyryderChuck

I'm currently experiencing this issue, which can't be consistently reproduced, but consistently happens in the same code path.

I'm using the pattern of using a pipe to control the lifecycle of the process/loop. This is the simplified version of the trigger:

Signal.trap("TERM") do
  @writer.print("TERM")
  @writer.flush
end

The reader in the main thread will deal with the TERM signal, and write to another pipe, which reader is registered in a NIO loop. The registered handler should evaluate the message, and stop the loop. This is the intended behaviour, and it does happen most of the time.

Two types of errors happen from time to time, however:

  • the writer from the loop-registered pipe raises IOError: stream closed on write (although the reader successfully received and handled the message; simply rescuing the exception "patches" this behaviour, doesn't fix it however)
  • More rarely, the event above triggers a VM level crash.

This happens usually under heavy load like when I run tests which start/stop many loop instances. When running them sequentially, this happens rarely.
Now I've added minitest/hell, and I'm seeing it way more often. This leads me to believe that there is some race condition somewhere, and would greatly appreciate some input on how to debug this.

This is the relevant information I can gather:

  • Ruby 2.3.1p112
  • Mac OS Sierra 10.12.4
  • pooler API is the select syscall

Here's the relevant coredump (I'll ignore the non-relevant ruby-platform threads until someone asks otherwise):

-- Control frame information -----------------------------------------------
c:0004 p:---- s:0013 e:000012 CFUNC  :select
c:0003 p:0120 s:0009 e:000008 METHOD 


-- Machine register context ------------------------------------------------
 rax: 0x0000000024ce02b0 rbx: 0x00007ff326154da0 rcx: 0x0000000024ce02b0
 rdx: 0x00007ff3b94d58cc rdi: 0x00007ff32632d130 rsi: 0x000000010bcf1080
 rbp: 0x000070000d1524f0 rsp: 0x000070000d1524e0  r8: 0x0000000000000000
  r9: 0x0000000000000000 r10: 0x0000000000000000 r11: 0x0000000000000000
 r12: 0x000000010b3a9c20 r13: 0x0000000000000000 r14: 0x00007ff326154da0
 r15: 0x00007ff326154dc8 rip: 0x000000010b914e17 rfl: 0x0000000000010202

-- C level backtrace information -------------------------------------------
0   ruby                                0x000000010b2feb94 rb_vm_bugreport + 388
1   ruby                                0x000000010b19ac8a rb_bug_context + 490
2   ruby                                0x000000010b26f184 sigsegv + 68
3   libsystem_platform.dylib            0x00007fffb2260b3a _sigtramp + 26
4   nio4r_ext.bundle                    0x000000010b914e17 ev_invoke_pending + 119
5   ???                                 0x41d6392e4e1cb1b8 0x0 + 4744042128523178424



...
Thread 4:: server.rb:18
0   libsystem_kernel.dylib        	0x00007fffb217feb6 __select + 10
1   nio4r_ext.bundle              	0x000000010b919432 select_poll + 162 (ev_select.c:170)
2   ruby                          	0x000000010b30724d rb_thread_call_without_gvl + 93
3   nio4r_ext.bundle              	0x000000010b915119 ev_run + 745 (ev.c:3755)
4   nio4r_ext.bundle              	0x000000010b91a68e NIO_Selector_select_synchronized + 142 (selector.c:352)
5   ruby                          	0x000000010b1a4649 rb_ensure + 169
6   nio4r_ext.bundle              	0x000000010b919f3a NIO_Selector_select + 106 (selector.c:320)
7   ruby                          	0x000000010b2f4e8a vm_call_cfunc + 314
8   ruby                          	0x000000010b2df3f8 vm_exec_core + 9720
9   ruby                          	0x000000010b2ef6c3 vm_exec + 131
10  ruby                          	0x000000010b2edfd4 vm_invoke_proc + 196
11  ruby                          	0x000000010b30d857 thread_start_func_2 + 1463
12  ruby                          	0x000000010b30d27a thread_start_func_1 + 170
13  libsystem_pthread.dylib       	0x00007fffb226a9af _pthread_body + 180
14  libsystem_pthread.dylib       	0x00007fffb226a8fb _pthread_start + 286
15  libsystem_pthread.dylib       	0x00007fffb226a101 thread_start + 13

Thread 5 Crashed:: server.rb:18
0   libsystem_kernel.dylib        	0x00007fffb217fd42 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fffb226d5bf pthread_kill + 90
2   libsystem_c.dylib             	0x00007fffb20e5420 abort + 129
3   ruby                          	0x000000010b19aa99 die + 9
4   ruby                          	0x000000010b19acde rb_bug_context + 574
5   ruby                          	0x000000010b26f184 sigsegv + 68
6   libsystem_platform.dylib      	0x00007fffb2260b3a _sigtramp + 26

...

Thread 5 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000006  rcx: 0x00007ff3251377f8  rdx: 0x0000000000000000
  rdi: 0x0000000000001617  rsi: 0x0000000000000006  rbp: 0x00007ff325137820  rsp: 0x00007ff3251377f8
   r8: 0x0000000000000040   r9: 0x00007fffbafb4040  r10: 0x0000000008000000  r11: 0x0000000000000206
  r12: 0x00007ff325137950  r13: 0x000000000000003e  r14: 0x000070000d153000  r15: 0x000000010b3243f6
  rip: 0x00007fffb217fd42  rfl: 0x0000000000000206  cr2: 0x00007fffbafb2128

...
External Modification Summary:
  Calls made by other processes targeting this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by all processes on this machine:
    task_for_pid: 4376575
    thread_create: 0
    thread_set_state: 0

VM Region Summary:
ReadOnly portion of Libraries: Total=201.0M resident=0K(0%) swapped_out_or_unallocated=201.0M(100%)
Writable regions: Total=78.5M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=78.5M(100%)
 
                                VIRTUAL   REGION 
REGION TYPE                        SIZE    COUNT (non-coalesced) 
===========                     =======  ======= 
Kernel Alloc Once                    8K        2 
MALLOC                            65.2M       15 
MALLOC guard page                   16K        4 
STACK GUARD                       56.0M        7 
Stack                             13.1M        8 
Stack Guard                          4K        2 
__DATA                            9720K      162 
__LINKEDIT                       116.2M       26 
__TEXT                            84.9M      161 
__UNICODE                          556K        2 
shared memory                       12K        4 
===========                     =======  ======= 
TOTAL                            345.4M      382 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions