Skip to content

Crash on [AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:] when running publish and subscribe actions sequentially #5497

Open
@MatildeDevq

Description

@MatildeDevq

Describe the bug
A crash occurs due to a race condition in the [AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:] function. The issue is caused by concurrent access to the ackCallbackDictionary when running publish and subscribe actions sequentially.

To Reproduce
Steps to reproduce the behavior:
There is no fixed reproducibility procedure, and crashes occur at irregular intervals but these are the steps I followed:

  1. Connect to AWS IoT
  2. Subscribe and then publish messages on topics with QoS1 in quick succession.
  3. Thread Sanitizer tool on Xcode underlying the possibility of race conditions always on ackCallbackDictionary

Observed Behavior
There are two functions involved:

  1. -session:newAckForMessageId:

    • The callback associated with a msgId is returned on a background thread.
    • After execution, the ID is removed from ackCallbackDictionary
  2. (AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:):

    • A new msgId is generated and added to ackCallbackDictionary with its callback.

    Since both methods access ackCallbackDictionary on different threads without protection, concurrent access may occur.

Stack Trace

   Crashed: com.apple.root.user-initiated-qos.cooperative
0  CoreFoundation                 0x48c48 -[__NSSetM addObject:] + 356
1  AWSIoT                         0x84974 -[AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:] + 444
2  AWSIoT                         0x60a1c -[AWSIoTMQTTClient publishData:qos:onTopic:retain:ackCallback:] + 628
3  AWSIoT                         0x6067c -[AWSIoTMQTTClient publishString:qos:onTopic:ackCallback:] + 160
4  AWSIoT                         0x8040 -[AWSIoTDataManager publishString:onTopic:QoS:ackCallback:] + 228
5  Infinity                       0x24c3cc IoTDataSource.requestMainShadow(thingName:) + 251 (IoTDataSource.swift:251)
6  Infinity                       0x24a61c IoTDataSource.subscribeToDeviceShadow(thingName:) + 136 (IoTDataSource.swift:136)
7  libswift_Concurrency.dylib     0x60f5c swift::runJobInEstablishedExecutorContext(swift::Job*) + 252
8  libswift_Concurrency.dylib     0x62514 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 144
9  libdispatch.dylib              0x15ec0 _dispatch_root_queue_drain + 392
10 libdispatch.dylib              0x166c4 _dispatch_worker_thread2 + 156
11 libsystem_pthread.dylib        0x3644 _pthread_wqthread + 228
12 libsystem_pthread.dylib        0x1474 start_wqthread + 8

**Areas of the SDK you are using **
AWS IoT

Screenshots

Image

Environment:

  • SDK Version: [2.40.0]
  • Dependency Manager: [Cocoapods]
  • Swift Version : [5.0]
  • Xcode Version: [16.2]

Device Information (please complete the following information):

  • Device: [e.g. iPhone16, Simulator]
  • iOS Version: [e.g. iOS 18.2]

Solution
I created a fork where I modified the removal of the ack from the dictionary before calling the callback. This is a temporary workaround to avoid immediate issues, but a permanent fix would require changes to the concurrency management in the AWSIoT library, such as using appropriate synchronization mechanisms for access to the ackCallbackDictionary.
Link

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingiotIssues related to the IoT SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions