Skip to content

Darwin: reopening 10k queues causes semaphore creation failures #1003

@678098

Description

@678098

Description

This issue occurs on Darwin (MacOS) when client is reopening a large number of queues (≈10,000) simultaneously.

Stack

FATAL /Users/emalygin/work/blazingmq/thirdparty/bde/groups/bsl/bslmt/bslmt_semaphoreimpl_darwin.cpp:75 Assertion failed: 'sem_open' failed

* thread #2, name = 'bmqFSMEvtQ', stop reason = signal SIGABRT
  * frame #0: 0x00000001871fe388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000187237848 libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x00000001871409e4 libsystem_c.dylib`abort + 124
    frame #3: 0x00000001006b44d0 producer.tsk`BloombergLP::bsls::AssertImpUtil::failByAbort() + 12
    frame #4: 0x00000001006b40cc producer.tsk`BloombergLP::bsls::Assert::failByAbort(BloombergLP::bsls::AssertViolation const&) + 16
    frame #5: 0x00000001006b40ec producer.tsk`BloombergLP::bsls::Assert::invokeHandlerNoReturn(BloombergLP::bsls::AssertViolation const&) + 32
    frame #6: 0x00000001006b1e6c producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::DarwinSemaphore>::SemaphoreImpl(int) + 316
    frame #7: 0x00000001000c72b8 producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::CountedSemaphore>::SemaphoreImpl(int) + 52
    frame #8: 0x00000001000c7274 producer.tsk`BloombergLP::bslmt::SemaphoreImpl<BloombergLP::bslmt::Platform::CountedSemaphore>::SemaphoreImpl(int) + 36
    frame #9: 0x00000001000c7240 producer.tsk`BloombergLP::bslmt::Semaphore::Semaphore() + 32
    frame #10: 0x00000001000b335c producer.tsk`BloombergLP::bslmt::Semaphore::Semaphore() + 28
    frame #11: 0x00000001000d1b0c producer.tsk`BloombergLP::bmqp::RequestManagerRequest<BloombergLP::bmqp_ctrlmsg::ControlMessage, BloombergLP::bmqp_ctrlmsg::ControlMessage>::RequestManagerRequest(BloombergLP::bslma::Allocator*) + 92
    frame #12: 0x00000001000d19ac producer.tsk`BloombergLP::bmqp::RequestManagerRequest<BloombergLP::bmqp_ctrlmsg::ControlMessage, BloombergLP::bmqp_ctrlmsg::ControlMessage>::RequestManagerRequest(BloombergLP::bslma::Allocator*) + 36

Steps to reproduce

  1. Start a local broker
  2. Start a client, open 10k queues in Sync mode
  3. Stop the broker
  4. Restart the broker
  5. Client tries to reopen all 10k queues at once
  6. Client construct instances of RequestManagerRequest class to reopen queues, this class has a semaphore field:

The client eventually crashes once enough RequestManagerRequest instances are created.

Root cause

Darwin doesn't support unnamed POSIX semaphores, so bslmt::Semaphore implementation on Darwin always creates a named semaphore that has a file associated with it. There is also a kernel limitation of 31 chars for a file path to a semaphore. bslmt::Semaphore implementation generates pseudo-random file names for semaphores that are good enough for small number of semaphores but tend to collide if the number of created semaphores is large enough. If there is a collision, a newly constructed semaphore cannot have its file opened, and its construction fails on assertion.

This issue is not observed in Linux. Unnamed semaphores in Linux don't require opening a file, they reside in user memory only, so there is not need for a custom uniqueness mechanism. Named semaphores in Linux allow to use up to 255 chars for file path and placed in shared memory. 255 chars is more than enough to ensure uniqueness of a semaphore name (just put the current memory address as semaphore identity).

Proposals

  1. Use Condition/Mutex for RequestManagerRequest
  2. Update bslmt::Semaphore implementation in bde to always use the full memory address of a semaphore object to ensure uniqueness, here: https://github.com/bloomberg/bde/blob/a438a61889c61aa085da4df2f2e533da16f243f6/groups/bsl/bslmt/bslmt_semaphoreimpl_darwin.cpp#L48

The current implementation only uses 4 bytes of the address: 00b921c0 in bslmt_semaphore_276c_00b921c0

Since the number of characters is limited, we have to cut corners:

bslmt_semaphore_276c_00b921c0 # original
bslmt_276c_00b921c000b921c0   # option 1 (get rid of semaphore word)
bslmtSema276c00b921c000b921c0 # option 2 (get rid of _ separators and shorten semaphore word)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions