Skip to content

AWS IoT Device Client Segmentation fault #462

@veigaMak

Description

@veigaMak

Describe the bug

When trying to follow the Get Started with AWS IoT tutorial, I've replaced the Cloud9 instance with my own Raspberrypi. I manage to do everything until the end of the 3.3. For some reason the aws-iot-device-client service does not work. I tried to run directly the executable (sudo /sbin/aws-iot-device-client --config-file /etc/.aws-iot-device-client/aws-iot-device-client.conf) and I got a "Segmentation fault" error.

To Reproduce

Steps to reproduce the behavior:

  1. Install a clean version of RaspberryOS lite
  2. Install cmake, libssl-dev, git
  3. Install AWS iot sdk (is this necessary?)
  4. Install and configure AWS CLI
  5. Install AWS iot device client
  6. Execute the setup.sh script

Logs

2024-07-04T14:02:35.906Z [INFO]  {Config.cpp}: Successfully fetched JSON config file:
    {
      "endpoint":       "endpoint",
      "cert":   "cert",
      "key":    "key",
      "root-ca":        "root-ca",
      "thing-name":     "deviceClientThing",
      "logging":        {
        "level":        "DEBUG",
        "type": "FILE",
        "file": "/var/log/aws-iot-device-client/aws-iot-device-client.log",
        "enable-sdk-logging":   false,
        "sdk-log-level":        "TRACE",
        "sdk-log-file": "/var/log/aws-iot-device-client/sdk.log"
      },
      "jobs":   {
        "enabled":      true,
        "handler-directory": "/etc/.aws-iot-device-client/jobs"
      },
      "tunneling":      {
        "enabled":      true
      },
      "device-defender":        {
        "enabled":      true,
        "interval": 300
      },
      "fleet-provisioning":     {
        "enabled":      false,
        "template-name": "",
        "template-parameters": "",
        "csr-file": "",
        "device-key": ""
      },
      "samples": {
        "pub-sub": {
          "enabled": true,
          "publish-topic": "/topic/workshop/dc/pub",
          "publish-file": "/home/pi/workshop_dc/pubfile.txt",
          "subscribe-topic": "/topic/workshop/dc/sub",
          "subscribe-file": "/home/pi/workshop_dc/subfile.txt"
        }
      },
      "config-shadow":  {
        "enabled":      false
      },
      "sample-shadow": {
        "enabled": false,
        "shadow-name": "",
        "shadow-input-file": "",
        "shadow-output-file": ""
      }
    }

2024-07-04T14:02:35.906Z [INFO]  {FileUtils.cpp}: Successfully create directory /root/.aws-iot-device-client/sample-shadow/ with required permissions 700
2024-07-04T14:02:35.906Z [INFO]  {Config.cpp}: ~/.aws-iot-device-client/sample-shadow/default-sample-shadow-document
2024-07-04T14:02:35.906Z [INFO]  {Config.cpp}: Succesfully create default file: /root/.aws-iot-device-client/sample-shadow/default-sample-shadow-document required for storage of shadow document
2024-07-04T14:02:35.906Z [DEBUG] {Config.cpp}: Did not find a runtime configuration file, assuming Fleet Provisioning has not run for this device
2024-07-04T14:02:35.906Z [DEBUG] {Config.cpp}: Did not find a http proxy config file /root/.aws-iot-device-client/http-proxy.conf, assuming HTTP proxy is disabled on this device
2024-07-04T14:02:35.907Z [DEBUG] {EnvUtils.cpp}: Updated PATH environment variable to: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/.aws-iot-device-client:/root/.aws-iot-device-client/jobs:/home/pi/workshop_dc/aws-iot-device-client:/home/pi/workshop_dc/aws-iot-device-client/jobs
2024-07-04T14:02:35.907Z [DEBUG] {LockFile.cpp}: creating lockfile
2024-07-04T14:02:35.907Z [INFO]  {Main.cpp}: Now running AWS IoT Device Client version v1.9.1-bfae937
2024-07-04T14:02:35.907Z [INFO]  {SharedCrtResourceManager.cpp}: SDK logging is disabled. Enable it with --enable-sdk-logging on the command line or logging::enable-sdk-logging in your configuration file
2024-07-04T14:02:35.907Z [DEBUG] {Retry.cpp}: Retryable function starting, it will retry until success
2024-07-04T14:02:36.022Z [INFO]  {SharedCrtResourceManager.cpp}: Establishing MQTT connection with client id deviceClientThing...
2024-07-04T14:02:36.648Z [INFO]  {SharedCrtResourceManager.cpp}: MQTT connection established with return code: 0
Segmentation fault

Additional context

Is this related to this issue?

Activity

HarshGandhi-AWS

HarshGandhi-AWS commented on Jul 17, 2024

@HarshGandhi-AWS
Contributor

Hello @veigaMak , thank you for reaching out to us. To answer your question, no the issue you have linked should not related since it was resolved in previous client version update.

Give us some time to reproduce the issue and find the root cause. Most likely it is an setup issue but I can share more details once I reproduce the issue and solve it.

Regards,
Harsh Gandhi

rui-maksense

rui-maksense commented on Aug 20, 2024

@rui-maksense

Hello @HarshGandhi-AWS.

Any news on this issue, I'm also getting this exact behaviour.

Here's some additional info on my system

Linux version 6.6.20+rpt-rpi-v8 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07)

Cheers

ig15

ig15 commented on Sep 25, 2024

@ig15
Contributor

Hi @rui-maksense . Thanks for reaching out to us. I see that you haven't provided the absolute path to the cert, key and roo-ca, in your aws-iot-device-client.conf file. Also, you haven't provided the endpoint it seems. Kindly try with these 2 modifications and let me know if it is working.

{
      "endpoint":       "endpoint",
      "cert":   "cert",
      "key":    "key",
      "root-ca":        "root-ca",

It should look something like:

{
  "endpoint": "<endpoint>",
  "cert": "<cert_file_path>/cert.pem.crt",
  "key": "<key_file_path>/private.pem.key",
  "root-ca": "<root_ca_path>/AmazonRootCA1.key",

If it still doesn't work, we suggest you to try and follow the updated documentation for Device Client and let us know if it resolves your issue.

veigaMak

veigaMak commented on Sep 25, 2024

@veigaMak
Author

hi @ig15 , it was me that provided the aws-iot-device-client.conf file. Yeah I know that in that log it does not have the correct paths. I removed it before posting it, for privacy reasons. In the actual log the paths were correct.

ig15

ig15 commented on Oct 15, 2024

@ig15
Contributor

Hey @veigaMak . Can you set "enable-sdk-logging": true, "sdk-log-level": "DEBUG", and send the logs that you get to help us better understand and debug the issue. Also I hope you have checked the updated documentation for Device Client.

garysferrao

garysferrao commented on Jan 22, 2025

@garysferrao

just if it helps anyone, i faced the same problem: the MQTT client just segmentation-faults, without any trace/debug log.

2024-07-04T14:02:35.907Z [INFO]  {Main.cpp}: Now running AWS IoT Device Client version v1.9.1-bfae937
2024-07-04T14:02:35.907Z [INFO]  {SharedCrtResourceManager.cpp}: SDK logging is disabled. Enable it with --enable-sdk-logging on the command line or logging::enable-sdk-logging in your configuration file
2024-07-04T14:02:35.907Z [DEBUG] {Retry.cpp}: Retryable function starting, it will retry until success
2024-07-04T14:02:36.022Z [INFO]  {SharedCrtResourceManager.cpp}: Establishing MQTT connection with client id deviceClientThing...
2024-07-04T14:02:36.648Z [INFO]  {SharedCrtResourceManager.cpp}: MQTT connection established with return code: 0
Segmentation fault

i was using the Docker image (ubuntu, armv7) from AWS ECR built 15 days ago (seems this image ref).
after i deleted the samples section of the config.json, it ran without any faults. need to investigate that part further.

(note: technically, this is the OP's config and error log; i just want to show what section. mine was the same.)

      "samples": {
        "pub-sub": {
          "enabled": true,
          "publish-topic": "/topic/workshop/dc/pub",
          "publish-file": "/home/pi/workshop_dc/pubfile.txt",
          "subscribe-topic": "/topic/workshop/dc/sub",
          "subscribe-file": "/home/pi/workshop_dc/subfile.txt"
        }
      },
Ogy-GrayLine

Ogy-GrayLine commented on May 7, 2025

@Ogy-GrayLine

I was just about to lose my hair pulling it while doing exactly the same - trying to follow the same tutorial referenced just as a step-by-step and fighting with this very same setup - an RPi-5 with latest Raspbian updated and kernel while feeling stupid that it just doesn't work the straight case and verifying something wrong I perhaps did :-(

Was just going to file a ticket when I discovered there's already. Apparently not a real move and resolution happened, but at least I can provide a bit more insight as I went to troubleshoot this further.

  1. Raspberry PI-5 with default OS, updated to latest just 2 days back:
    1.1. uname -a Linux raspberrypi 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
    1.2. Installed libssl-dev, libc6, build-essential, zlib1g-dev, checkinstall
    1.3. Compiled just fine. Both initially as well as later with "DEBUG" flag added to CMAKE so I can attach a GDB while troubleshooting. I have snippets of the ouput from compile time for reference if needed.

  2. Output of the GDB session:

(gdb) run --config-file test-us-east-1-aws
Starting program: /home/userpi/Workspace/aws/aws-iot-device-client.debug-fresh-1 --config-file test-us-east-1-aws
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7aadd00 (LWP 10110)]
[New Thread 0x7ffff729dd00 (LWP 10111)]
[New Thread 0x7ffff6a8dd00 (LWP 10112)]
[New Thread 0x7ffff627dd00 (LWP 10113)]
[New Thread 0x7ffff5a6dd00 (LWP 10114)]
[Thread 0x7ffff627dd00 (LWP 10113) exited]
[New Thread 0x7ffff627dd00 (LWP 10115)]
Thread 1 "aws-iot-device-" received signal SIGSEGV, Segmentation fault.
__memset_zva64 () at ../sysdeps/aarch64/multiarch/../memset.S:86
86 ../sysdeps/aarch64/multiarch/../memset.S: No such file or directory.

(gdb) bt
#0 __memset_zva64 () at ../sysdeps/aarch64/multiarch/../memset.S:86
#1 0x00005555557db14c in aws_secure_zero (pBuf=0x5555555f97cc <std::__shared_ptr<Aws::Crt::Mqtt::MqttConnection, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()+24>, bufsize=140737488348840)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/build/aws-iot-device-sdk-cpp-v2-src/crt/aws-crt-cpp/crt/aws-c-common/source/common.c:57
#2 0x00005555557d5ad4 in aws_byte_buf_secure_zero (buf=0x7fffffffe5c0)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/build/aws-iot-device-sdk-cpp-v2-src/crt/aws-crt-cpp/crt/aws-c-common/source/byte_buf.c:90
#3 0x00005555557d5b64 in aws_byte_buf_clean_up_secure (buf=0x7fffffffe5c0)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/build/aws-iot-device-sdk-cpp-v2-src/crt/aws-crt-cpp/crt/aws-c-common/source/byte_buf.c:98
#4 0x00005555556a74b4 in Aws::Iot::DeviceClient::Samples::PubSubFeature::publishFileData (this=0x555556102560)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/source/samples/pubsub/PubSubFeature.cpp:249
#5 0x00005555556a7958 in Aws::Iot::DeviceClient::Samples::PubSubFeature::start (this=0x555556102560)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/source/samples/pubsub/PubSubFeature.cpp:284
#6 0x00005555555f1958 in Aws::Iot::DeviceClient::Util::FeatureRegistry::startAll (this=0x555555ff3d20) at /home/userpi/Workspace/aws/compile/aws-iot-device-client/source/FeatureRegistry.cpp:83
#7 0x00005555555f6e6c in Aws::Iot::DeviceClient::SharedCrtResourceManager::startDeviceClientFeatures (this=0x555555f13740)
at /home/userpi/Workspace/aws/compile/aws-iot-device-client/source/SharedCrtResourceManager.cpp:568
#8 0x0000555555648238 in main (argc=3, argv=0x7ffffffff378) at /home/userpi/Workspace/aws/compile/aws-iot-device-client/source/main.cpp:625
(gdb)

Any thoughts on why it actually fails on the "memset.S" which seems missing?

Thanks a lot in advance!

Ogy-GrayLine

Ogy-GrayLine commented on May 7, 2025

@Ogy-GrayLine

Getting rid of the "samples" and "sample-shadow" sections as mentioned by @garysferrao here: #462 (comment)

does change the behavior on my compiled, non-docker RPi-5, but still doesn't make it a successful running client, unfortunately. Here's what I can see in the logs from that test:

  1. Client log:

2025-05-07T09:15:36.427Z [DEBUG] {Config.cpp}: Did not find a runtime configuration file, assuming Fleet Provisioning has not run for this device
2025-05-07T09:15:36.427Z [DEBUG] {Config.cpp}: Did not find a http proxy config file /home/userpi/.aws-iot-device-client/http-proxy.conf, assuming HTTP proxy is disabled on this device
2025-05-07T09:15:36.427Z [DEBUG] {EnvUtils.cpp}: Updated PATH environment variable to: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games:/home/userpi/.aws-iot-device-client:/home/userpi/.aws-iot-device-client/jobs:/home/userpi/Workspace/aws:/home/userpi/Workspace/aws/jobs
2025-05-07T09:15:36.427Z [DEBUG] {LockFile.cpp}: creating lockfile
2025-05-07T09:15:36.427Z [INFO] {Main.cpp}: Now running AWS IoT Device Client version v1.10.11-d2fce80
2025-05-07T09:15:36.428Z [INFO] {SharedCrtResourceManager.cpp}: SDK logging is enabled. Check /home/userpi/Workspace/aws/logs/sdk.log for SDK logs.
2025-05-07T09:15:36.428Z [DEBUG] {Retry.cpp}: Retryable function starting, it will retry until success
2025-05-07T09:15:36.470Z [INFO] {SharedCrtResourceManager.cpp}: Establishing MQTT connection with client id kur...
2025-05-07T09:15:37.036Z [INFO] {SharedCrtResourceManager.cpp}: MQTT connection established with return code: 0
2025-05-07T09:15:37.036Z [INFO] {SharedCrtResourceManager.cpp}: Shared MQTT connection is ready!
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Config shadow is disabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Jobs is enabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Secure Tunneling is enabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Device Defender is enabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Sample shadow is disabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Pub Sub is disabled
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Sensor Publish is disabled
2025-05-07T09:15:37.037Z [INFO] {SharedCrtResourceManager.cpp}: Starting Device Client features.
2025-05-07T09:15:37.037Z [DEBUG] {FeatureRegistry.cpp}: Attempting to start Device Defender
2025-05-07T09:15:37.037Z [INFO] {DeviceDefender.cpp}: Starting Device Defender
2025-05-07T09:15:37.037Z [INFO] {DeviceDefender.cpp}: Device Defender task builder interval: 300
2025-05-07T09:15:37.037Z [DEBUG] {DeviceDefender.cpp}: Device Defender task build finished
2025-05-07T09:15:37.037Z [DEBUG] {DeviceDefender.cpp}: Device Defender StartTask() async called
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Client base has been notified that Device Defender has started
2025-05-07T09:15:37.037Z [DEBUG] {FeatureRegistry.cpp}: Attempting to start Jobs
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Client base has been notified that Jobs has started
2025-05-07T09:15:37.037Z [DEBUG] {FeatureRegistry.cpp}: Attempting to start Secure Tunneling
2025-05-07T09:15:37.037Z [INFO] {JobsFeature.cpp}: Running Jobs!
2025-05-07T09:15:37.037Z [INFO] {SecureTunnelingFeature.cpp}: Running Secure Tunneling!
2025-05-07T09:15:37.037Z [DEBUG] {JobsFeature.cpp}: Attempting to subscribe to startNextPendingJobExecution accepted and rejected
2025-05-07T09:15:37.037Z [INFO] {Main.cpp}: Client base has been notified that Secure Tunneling has started
2025-05-07T09:15:37.037Z [DEBUG] {JobsFeature.cpp}: Attempting to subscribe to nextJobChanged events
2025-05-07T09:15:37.037Z [DEBUG] {JobsFeature.cpp}: Attempting to subscribe to updateJobExecutionStatusAccepted for jobId +
2025-05-07T09:15:37.150Z [ERROR] {SharedCrtResourceManager.cpp}: MQTT Connection interrupted with error: libaws-c-mqtt: AWS_ERROR_MQTT_UNEXPECTED_HANGUP, The connection was closed unexpectedly.. Device Client will retry connection until it is successfully connected to the core.
2025-05-07T09:15:47.038Z [ERROR] {JobsFeature.cpp}: Timed out while waiting for acknowledgement of subscription to UpdateJobExecutionStatusAccepted
2025-05-07T09:15:47.038Z [ERROR] {Main.cpp}: Subscription rejected: Timed out while waiting for acknowledgement of subscription to UpdateJobExecutionStatusAccepted
2025-05-07T09:15:47.038Z [ERROR] {Main.cpp}: *** AWS IOT DEVICE CLIENT FATAL ERROR: Aborting program due to unrecoverable feature error! ***
2025-05-07T09:15:47.038Z [DEBUG] {SharedCrtResourceManager.cpp}: Attempting to disconnect MQTT connection
2025-05-07T09:15:52.162Z [INFO] {SharedCrtResourceManager.cpp}: MQTT Connection is now disconnected

  1. Then in the sdk.log (snippet is a bit big, so pasting the last 20 lines only, can provide the entire one if needed:

tail -20 sdk.log
[DEBUG] [2025-05-07T09:15:37Z] [00007ffff6a8dd00] [channel-bootstrap] - id=0x55555610ad60: releasing bootstrap reference
[TRACE] [2025-05-07T09:15:37Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: detected more scheduled tasks with the next occurring at 2326679741, using timeout of 2326.
[TRACE] [2025-05-07T09:15:37Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: waiting for a maximum of 2326 ms
[TRACE] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: wake up with 0 events to process.
[TRACE] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: running scheduled tasks.
[DEBUG] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [task-scheduler] - id=0x7fffec0018e0: Running (null) task with status
[TRACE] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [socket] - task_id=0x7fffec0018e0: timeout task triggered, evaluating timeouts.
[TRACE] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: detected more scheduled tasks with the next occurring at 12671515242, using timeout of 12671.
[TRACE] [2025-05-07T09:15:39Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: waiting for a maximum of 12671 ms
[DEBUG] [2025-05-07T09:15:47Z] [00007ffff627dd00] [mqtt-client] - id=0x7fffe00815d0: user called disconnect.
[DEBUG] [2025-05-07T09:15:47Z] [00007ffff627dd00] [mqtt-client] - id=0x7fffe00815d0: User requests disconnecting, switch state to DISCONNECTING.
[DEBUG] [2025-05-07T09:15:47Z] [00007ffff627dd00] [mqtt-client] - id=0x7fffe00815d0: Closing connection
[TRACE] [2025-05-07T09:15:47Z] [00007ffff627dd00] [mqtt-client] - id=0x7fffe00815d0: Client currently has no slot to disconnect
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: wake up with 0 events to process.
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: running scheduled tasks.
[DEBUG] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [task-scheduler] - id=0x7fffec016060: Running mqtt_reconnect task with status
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [mqtt-client] - id=0x7fffe00815d0: Skipping reconnect: Client is trying to disconnect
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [mqtt-client] - id=0x7fffe00815d0: Reconnect task called but client is disconnecting and has no slot. Finishing disconnect
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: detected more scheduled tasks with the next occurring at 284876147527, using timeout of 284876.
[TRACE] [2025-05-07T09:15:52Z] [00007ffff6a8dd00] [event-loop] - id=0x5555560ae5d0: waiting for a maximum of 284876 ms

  1. GDB session:

(gdb) run --config-file test-us-east-1-aws

Starting program: /home/userpi/Workspace/aws/aws-iot-device-client.debug-fresh-1 --config-file test-us-east-1-aws
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7aadd00 (LWP 10705)]
[New Thread 0x7ffff729dd00 (LWP 10706)]
[New Thread 0x7ffff6a8dd00 (LWP 10707)]
[New Thread 0x7ffff627dd00 (LWP 10708)]
[New Thread 0x7ffff5a6dd00 (LWP 10709)]
[New Thread 0x7ffff627dd00 (LWP 10710)]
[Thread 0x7ffff627dd00 (LWP 10708) exited]
bt

AWS IoT Device Client must abort execution, reason: Jobs encountered an error
Please check the AWS IoT Device Client logs for more information
[Thread 0x7ffff7aadd00 (LWP 10705) exited]
[Thread 0x7ffff627dd00 (LWP 10710) exited]
[Thread 0x7ffff6a8dd00 (LWP 10707) exited]
[Thread 0x7ffff729dd00 (LWP 10706) exited]
[Thread 0x7ffff7fe5000 (LWP 10703) exited]
[Thread 0x7ffff5a6dd00 (LWP 10709) exited]
[New process 10703]
[Inferior 1 (process 10703) exited with code 01]
(gdb)
(gdb) bt
No stack.

Apparently - not much of a clue what happens and I've run out of troubleshooting ideas as it looks like the client is non-usable on RPi-s at least at the moment :-(

garysferrao

garysferrao commented on May 7, 2025

@garysferrao

@Ogy-GrayLine thanks for the debug logs. unfortunately i'm not a maintainer; i hope a maintainer will take up this issue soon.

86 ../sysdeps/aarch64/multiarch/../memset.S: No such file or directory

this happens because gdb is trying to find even more details about the line number and looking up a source file. i assume that because memset is common, there is no source file available. (the fault already happened, and then gdb is trying to find the exact cause).

https://stackoverflow.com/a/10629233
this answer says to use -g when compiling to explicitly add that source information.

need to investigate that part further.

i myself don't have the resources to compile and debug this. but on some light testing at the time, it seems the fault happened because those topics in samples.pub-sub.publish-topic et cetera do not exist. when i used an existing topic there was no error iirc. 🤔

Ogy-GrayLine

Ogy-GrayLine commented on May 7, 2025

@Ogy-GrayLine

Hi @garysferrao ,

Thanks a lot for the assistance.

https://stackoverflow.com/a/10629233
this answer says to use -g when compiling to explicitly add that source information.

Yes, but as the tutorial which is mentioned initially, i.e. this one:
Get Started with AWS IoT

build happens with CMAKE instead of GCC/G++ directly. Thus I've followed this post here and configured the build by executing cmake -D CMAKE_BUILD_TYPE=Debug ../ instead of simply "cmake ../ " .
To my understanding that should be needed to add the debug info. Maybe it's some external library which is missing the source to show additional info? 🤔

@garysferrao Could you please clarify what do you mean by:

when i used an existing topic there was no error iirc

I have no topics in AWS IoT Core preloaded and seems such are not explicitly created. So I'm a bit confused what to try on that direction?

@ig15, Hi! I've provided debug logs as you request above and gdb snippets. Do you need anything else which I can do since I have a test bed setup to make this client actually working on Raspberry?

Thanks a lot in advance, Team!

P.S. I've also read and tried to additionally debug this not only with GDB, but also valgrind, a tool which I wasn't aware of and not familiar. Still, to show what else could be seen in there:

  1. Executed the command:

valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all -s ./aws-iot-device-client.debug-fresh-2 --config-file test-us-east-1-aws.segmentation_fault

Print the following interesting snippet (full log I can paste if needed, but it's indeed large):

==18009==
==18009== Process terminating with default action of signal 11 (SIGSEGV)
==18009== Bad permissions for mapped region at address 0x1C97CC
==18009== at 0x488EEF0: memset (vg_replace_strmem.c:1358)
==18009== by 0x3AB14B: aws_secure_zero (common.c:57)
==18009== by 0x3A5AD3: aws_byte_buf_secure_zero (byte_buf.c:90)
==18009== by 0x3A5B63: aws_byte_buf_clean_up_secure (byte_buf.c:98)
==18009== by 0x2774B3: Aws::Iot::DeviceClient::Samples::PubSubFeature::publishFileData() (PubSubFeature.cpp:249)
==18009== by 0x277957: Aws::Iot::DeviceClient::Samples::PubSubFeature::start() (PubSubFeature.cpp:284)
==18009== by 0x1C1957: Aws::Iot::DeviceClient::Util::FeatureRegistry::startAll() const (FeatureRegistry.cpp:83)
==18009== by 0x1C6E6B: Aws::Iot::DeviceClient::SharedCrtResourceManager::startDeviceClientFeatures() const (SharedCrtResourceManager.cpp:568)
==18009== by 0x218237: main (main.cpp:625)
==18009==
==18009== HEAP SUMMARY:
==18009== in use at exit: 1,920,466 bytes in 28,202 blocks
==18009== total heap usage: 408,403 allocs, 380,201 frees, 16,065,782 bytes allocated

Reading the stack trace from bottom to up it seems to me points to line 249 in PubSubFeature.cpp in the code.

I can see this has been changed 2 years ago by this PR with motivation to do a clean up exactly upon startup.
But as I am not a C/C++ guy at all this is the most I can make and signals mine max limit in that direction.

Not really sure if it's in the right direction at all 🤷

ig15

ig15 commented on May 9, 2025

@ig15
Contributor

@Ogy-GrayLine Thanks for sharing the details. We have ordered a Raspberry Pi 5 device and we'll start to work on the issue once we get our hands on it to reproduce the issue.

garysferrao

garysferrao commented on May 13, 2025

@garysferrao

@Ogy-GrayLine

So I'm a bit confused what to try on that direction?

sorry, it was just me trying to remember what i was doing at the time. it could have been replacing the topic /topic/workshop/dc/* or the file /home/pi/workshop_dc/*.txt with something that exists. my reasoning was that if it's in the document tutorial, surely something must work; and iirc replacing them with existing paths did not crash the iot-device-client. but in the end i just removed that section from the JSON because i didn't need them.
basically, i didn't mean for you to try anything.

Thus I've followed this post here and configured the build by executing cmake -D CMAKE_BUILD_TYPE=Debug ../

hmm, if CMAKE_BUILD_TYPE=Debug is not adding the -g flag, maybe you can manually add it
the file seems to be CMakeLists.txt; for example on L21:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ignored-attributes")

assuming your compiler is GCC, you can manually add the -g or -ggdb flag, say on line 22 then

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ggdb")

i mean to try this only if you're wondering about that error in gdb:

Any thoughts on why it actually fails on the "memset.S" which seems missing?

personally, i just extracted the compiled binary from the the Docker image. or better, wait for a response 🥺.

ig15

ig15 commented on May 28, 2025

@ig15
Contributor

@Ogy-GrayLine I was able to reproduce the Device Client error that you encountered on an RPi 5 with following specifications:
pi@raspberrypi:~/.aws-iot-device-client $ uname -a Linux raspberrypi 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux

Error logs:

2025-05-07T09:15:37.150Z [ERROR] {SharedCrtResourceManager.cpp}: MQTT Connection interrupted with error: libaws-c-mqtt: AWS_ERROR_MQTT_UNEXPECTED_HANGUP, The connection was closed unexpectedly.. Device Client will retry connection until it is successfully connected to the core.
2025-05-07T09:15:47.038Z [ERROR] {JobsFeature.cpp}: Timed out while waiting for acknowledgement of subscription to UpdateJobExecutionStatusAccepted
2025-05-07T09:15:47.038Z [ERROR] {Main.cpp}: Subscription rejected: Timed out while waiting for acknowledgement of subscription to UpdateJobExecutionStatusAccepted

This error is occurs when incorrect policy setup for the samples pub-sub feature. You must have publish, subscribe and received permissions for the topic in your AWS IOT policy.
For example: If the samples section in your aws-iot-device-client.conf looks like:

     "samples": {
       "pub-sub": {
         "enabled": true,
         "publish-topic": "testing",
         "publish-file": "/home/ubuntu/.aws-iot-device-client/pubsub/publish-file.txt",
         "subscribe-topic": "testing",
         "subscribe-file": "/home/ubuntu/.aws-iot-device-client/pubsub/subscribe-file.txt"
       }
   }

You should have a corresponding policy attached to your IOT thing:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Condition": {
        "Bool": {
          "iot:Connection.Thing.IsAttached": "true"
        }
      },
      "Effect": "Allow",
      "Action": "iot:Connect",
      "Resource": "arn:aws:iot:us-west-2:<ACC ID>:client/${iot:Connection.Thing.ThingName}"
    },
    {
      "Effect": "Allow",
      "Action": "iot:Publish",
      "Resource": "arn:aws:iot:us-west-2:<ACC ID>:topic/testing"
    },
    {
      "Effect": "Allow",
      "Action": "iot:Subscribe",
      "Resource": "arn:aws:iot:us-west-2:<ACC ID>:topicfilter/testing"
    },
    {
      "Effect": "Allow",
      "Action": "iot:Receive",
      "Resource": "arn:aws:iot:us-west-2:<ACC ID>:topic/testing"
    }
  ]
}

Refer here for more information.

That is why if you remove the samples section itself, the issue is not seen.
Do let us know if your problem is resolved.

ig15

ig15 commented on Jun 8, 2025

@ig15
Contributor

We are closing this issue. Feel free to reopen or open a new issue in case of further problems. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @garysferrao@rui-maksense@HarshGandhi-AWS@ig15@veigaMak

        Issue actions

          AWS IoT Device Client Segmentation fault · Issue #462 · awslabs/aws-iot-device-client