-
-
Notifications
You must be signed in to change notification settings - Fork 90
Description
I'm currently experiencing this issue, which can't be consistently reproduced, but consistently happens in the same code path.
I'm using the pattern of using a pipe to control the lifecycle of the process/loop. This is the simplified version of the trigger:
Signal.trap("TERM") do
@writer.print("TERM")
@writer.flush
end
The reader in the main thread will deal with the TERM signal, and write to another pipe, which reader is registered in a NIO loop. The registered handler should evaluate the message, and stop the loop. This is the intended behaviour, and it does happen most of the time.
Two types of errors happen from time to time, however:
- the writer from the loop-registered pipe raises
IOError: stream closed
on write (although the reader successfully received and handled the message; simply rescuing the exception "patches" this behaviour, doesn't fix it however) - More rarely, the event above triggers a VM level crash.
This happens usually under heavy load like when I run tests which start/stop many loop instances. When running them sequentially, this happens rarely.
Now I've added minitest/hell
, and I'm seeing it way more often. This leads me to believe that there is some race condition somewhere, and would greatly appreciate some input on how to debug this.
This is the relevant information I can gather:
- Ruby 2.3.1p112
- Mac OS Sierra 10.12.4
- pooler API is the
select
syscall
Here's the relevant coredump (I'll ignore the non-relevant ruby-platform threads until someone asks otherwise):
-- Control frame information -----------------------------------------------
c:0004 p:---- s:0013 e:000012 CFUNC :select
c:0003 p:0120 s:0009 e:000008 METHOD
-- Machine register context ------------------------------------------------
rax: 0x0000000024ce02b0 rbx: 0x00007ff326154da0 rcx: 0x0000000024ce02b0
rdx: 0x00007ff3b94d58cc rdi: 0x00007ff32632d130 rsi: 0x000000010bcf1080
rbp: 0x000070000d1524f0 rsp: 0x000070000d1524e0 r8: 0x0000000000000000
r9: 0x0000000000000000 r10: 0x0000000000000000 r11: 0x0000000000000000
r12: 0x000000010b3a9c20 r13: 0x0000000000000000 r14: 0x00007ff326154da0
r15: 0x00007ff326154dc8 rip: 0x000000010b914e17 rfl: 0x0000000000010202
-- C level backtrace information -------------------------------------------
0 ruby 0x000000010b2feb94 rb_vm_bugreport + 388
1 ruby 0x000000010b19ac8a rb_bug_context + 490
2 ruby 0x000000010b26f184 sigsegv + 68
3 libsystem_platform.dylib 0x00007fffb2260b3a _sigtramp + 26
4 nio4r_ext.bundle 0x000000010b914e17 ev_invoke_pending + 119
5 ??? 0x41d6392e4e1cb1b8 0x0 + 4744042128523178424
...
Thread 4:: server.rb:18
0 libsystem_kernel.dylib 0x00007fffb217feb6 __select + 10
1 nio4r_ext.bundle 0x000000010b919432 select_poll + 162 (ev_select.c:170)
2 ruby 0x000000010b30724d rb_thread_call_without_gvl + 93
3 nio4r_ext.bundle 0x000000010b915119 ev_run + 745 (ev.c:3755)
4 nio4r_ext.bundle 0x000000010b91a68e NIO_Selector_select_synchronized + 142 (selector.c:352)
5 ruby 0x000000010b1a4649 rb_ensure + 169
6 nio4r_ext.bundle 0x000000010b919f3a NIO_Selector_select + 106 (selector.c:320)
7 ruby 0x000000010b2f4e8a vm_call_cfunc + 314
8 ruby 0x000000010b2df3f8 vm_exec_core + 9720
9 ruby 0x000000010b2ef6c3 vm_exec + 131
10 ruby 0x000000010b2edfd4 vm_invoke_proc + 196
11 ruby 0x000000010b30d857 thread_start_func_2 + 1463
12 ruby 0x000000010b30d27a thread_start_func_1 + 170
13 libsystem_pthread.dylib 0x00007fffb226a9af _pthread_body + 180
14 libsystem_pthread.dylib 0x00007fffb226a8fb _pthread_start + 286
15 libsystem_pthread.dylib 0x00007fffb226a101 thread_start + 13
Thread 5 Crashed:: server.rb:18
0 libsystem_kernel.dylib 0x00007fffb217fd42 __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fffb226d5bf pthread_kill + 90
2 libsystem_c.dylib 0x00007fffb20e5420 abort + 129
3 ruby 0x000000010b19aa99 die + 9
4 ruby 0x000000010b19acde rb_bug_context + 574
5 ruby 0x000000010b26f184 sigsegv + 68
6 libsystem_platform.dylib 0x00007fffb2260b3a _sigtramp + 26
...
Thread 5 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x0000000000000006 rcx: 0x00007ff3251377f8 rdx: 0x0000000000000000
rdi: 0x0000000000001617 rsi: 0x0000000000000006 rbp: 0x00007ff325137820 rsp: 0x00007ff3251377f8
r8: 0x0000000000000040 r9: 0x00007fffbafb4040 r10: 0x0000000008000000 r11: 0x0000000000000206
r12: 0x00007ff325137950 r13: 0x000000000000003e r14: 0x000070000d153000 r15: 0x000000010b3243f6
rip: 0x00007fffb217fd42 rfl: 0x0000000000000206 cr2: 0x00007fffbafb2128
...
External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 4376575
thread_create: 0
thread_set_state: 0
VM Region Summary:
ReadOnly portion of Libraries: Total=201.0M resident=0K(0%) swapped_out_or_unallocated=201.0M(100%)
Writable regions: Total=78.5M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=78.5M(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Kernel Alloc Once 8K 2
MALLOC 65.2M 15
MALLOC guard page 16K 4
STACK GUARD 56.0M 7
Stack 13.1M 8
Stack Guard 4K 2
__DATA 9720K 162
__LINKEDIT 116.2M 26
__TEXT 84.9M 161
__UNICODE 556K 2
shared memory 12K 4
=========== ======= =======
TOTAL 345.4M 382