-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Description
This issue occurs on Darwin (MacOS) when client is reopening a large number of queues (≈10,000) simultaneously.
Stack
FATAL /Users/emalygin/work/blazingmq/thirdparty/bde/groups/bsl/bslmt/bslmt_semaphoreimpl_darwin.cpp:75 Assertion failed: 'sem_open' failed
* thread #2, name = 'bmqFSMEvtQ', stop reason = signal SIGABRT
* frame #0: 0x00000001871fe388 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000187237848 libsystem_pthread.dylib`pthread_kill + 296
frame #2: 0x00000001871409e4 libsystem_c.dylib`abort + 124
frame #3: 0x00000001006b44d0 producer.tsk`BloombergLP::bsls::AssertImpUtil::failByAbort() + 12
frame #4: 0x00000001006b40cc producer.tsk`BloombergLP::bsls::Assert::failByAbort(BloombergLP::bsls::AssertViolation const&) + 16
frame #5: 0x00000001006b40ec producer.tsk`BloombergLP::bsls::Assert::invokeHandlerNoReturn(BloombergLP::bsls::AssertViolation const&) + 32
frame #6: 0x00000001006b1e6c producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::DarwinSemaphore>::SemaphoreImpl(int) + 316
frame #7: 0x00000001000c72b8 producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::CountedSemaphore>::SemaphoreImpl(int) + 52
frame #8: 0x00000001000c7274 producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::CountedSemaphore>::SemaphoreImpl(int) + 36
frame #9: 0x00000001000c7240 producer.tsk`BloombergLP::bslmt::Semaphore::Semaphore() + 32
frame #10: 0x00000001000b335c producer.tsk`BloombergLP::bslmt::Semaphore::Semaphore() + 28
frame #11: 0x00000001000d1b0c producer.tsk`BloombergLP::bmqp::RequestManagerRequest<BloombergLP::bmqp_ctrlmsg::ControlMessage, BloombergLP::bmqp_ctrlmsg::ControlMessage>::RequestManagerRequest(BloombergLP::bslma::Allocator*) + 92
frame #12: 0x00000001000d19ac producer.tsk`BloombergLP::bmqp::RequestManagerRequest<BloombergLP::bmqp_ctrlmsg::ControlMessage, BloombergLP::bmqp_ctrlmsg::ControlMessage>::RequestManagerRequest(BloombergLP::bslma::Allocator*) + 36
Steps to reproduce
- Start a local broker
- Start a client, open 10k queues in Sync mode
- Stop the broker
- Restart the broker
- Client tries to reopen all 10k queues at once
- Client construct instances of
RequestManagerRequestclass to reopen queues, this class has a semaphore field:, d_semaphore()
The client eventually crashes once enough RequestManagerRequest instances are created.
Root cause
Darwin doesn't support unnamed POSIX semaphores, so bslmt::Semaphore implementation on Darwin always creates a named semaphore that has a file associated with it. There is also a kernel limitation of 31 chars for a file path to a semaphore. bslmt::Semaphore implementation generates pseudo-random file names for semaphores that are good enough for small number of semaphores but tend to collide if the number of created semaphores is large enough. If there is a collision, a newly constructed semaphore cannot have its file opened, and its construction fails on assertion.
This issue is not observed in Linux. Unnamed semaphores in Linux don't require opening a file, they reside in user memory only, so there is not need for a custom uniqueness mechanism. Named semaphores in Linux allow to use up to 255 chars for file path and placed in shared memory. 255 chars is more than enough to ensure uniqueness of a semaphore name (just put the current memory address as semaphore identity).
Proposals
- Use
Condition/MutexforRequestManagerRequest - Update
bslmt::Semaphoreimplementation inbdeto always use the full memory address of a semaphore object to ensure uniqueness, here: https://github.com/bloomberg/bde/blob/a438a61889c61aa085da4df2f2e533da16f243f6/groups/bsl/bslmt/bslmt_semaphoreimpl_darwin.cpp#L48
The current implementation only uses 4 bytes of the address: 00b921c0 in bslmt_semaphore_276c_00b921c0
Since the number of characters is limited, we have to cut corners:
bslmt_semaphore_276c_00b921c0 # original
bslmt_276c_00b921c000b921c0 # option 1 (get rid of semaphore word)
bslmtSema276c00b921c000b921c0 # option 2 (get rid of _ separators and shorten semaphore word)