Description
Describe the bug
A crash occurs due to a race condition in the [AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:] function. The issue is caused by concurrent access to the ackCallbackDictionary when running publish and subscribe actions sequentially.
To Reproduce
Steps to reproduce the behavior:
There is no fixed reproducibility procedure, and crashes occur at irregular intervals but these are the steps I followed:
- Connect to AWS IoT
- Subscribe and then publish messages on topics with QoS1 in quick succession.
- Thread Sanitizer tool on Xcode underlying the possibility of race conditions always on ackCallbackDictionary
Observed Behavior
There are two functions involved:
-
-session:newAckForMessageId:
- The callback associated with a
msgId
is returned on a background thread. - After execution, the ID is removed from
ackCallbackDictionary
- The callback associated with a
-
(AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:)
:- A new
msgId
is generated and added toackCallbackDictionary
with its callback.
Since both methods access
ackCallbackDictionary
on different threads without protection, concurrent access may occur. - A new
Stack Trace
Crashed: com.apple.root.user-initiated-qos.cooperative
0 CoreFoundation 0x48c48 -[__NSSetM addObject:] + 356
1 AWSIoT 0x84974 -[AWSMQTTSession publishDataAtLeastOnce:onTopic:retain:onMessageIdResolved:] + 444
2 AWSIoT 0x60a1c -[AWSIoTMQTTClient publishData:qos:onTopic:retain:ackCallback:] + 628
3 AWSIoT 0x6067c -[AWSIoTMQTTClient publishString:qos:onTopic:ackCallback:] + 160
4 AWSIoT 0x8040 -[AWSIoTDataManager publishString:onTopic:QoS:ackCallback:] + 228
5 Infinity 0x24c3cc IoTDataSource.requestMainShadow(thingName:) + 251 (IoTDataSource.swift:251)
6 Infinity 0x24a61c IoTDataSource.subscribeToDeviceShadow(thingName:) + 136 (IoTDataSource.swift:136)
7 libswift_Concurrency.dylib 0x60f5c swift::runJobInEstablishedExecutorContext(swift::Job*) + 252
8 libswift_Concurrency.dylib 0x62514 swift_job_runImpl(swift::Job*, swift::SerialExecutorRef) + 144
9 libdispatch.dylib 0x15ec0 _dispatch_root_queue_drain + 392
10 libdispatch.dylib 0x166c4 _dispatch_worker_thread2 + 156
11 libsystem_pthread.dylib 0x3644 _pthread_wqthread + 228
12 libsystem_pthread.dylib 0x1474 start_wqthread + 8
**Areas of the SDK you are using **
AWS IoT
Screenshots
Environment:
- SDK Version: [2.40.0]
- Dependency Manager: [Cocoapods]
- Swift Version : [5.0]
- Xcode Version: [16.2]
Device Information (please complete the following information):
- Device: [e.g. iPhone16, Simulator]
- iOS Version: [e.g. iOS 18.2]
Solution
I created a fork where I modified the removal of the ack from the dictionary before calling the callback. This is a temporary workaround to avoid immediate issues, but a permanent fix would require changes to the concurrency management in the AWSIoT library, such as using appropriate synchronization mechanisms for access to the ackCallbackDictionary
.
Link