Skip to content

Failed to get number of ready entities for action client: wait set index for status subscription is out of bounds #1455

@szobov

Description

@szobov

Operating System:

Ubuntu 24.04

ROS version or commit hash:

jazzy

RMW implementation (if applicable):

rmw_fastrtps_cpp, rmw_cyclonedds_cpp

RMW Configuration (if applicable):

default

Client library (if applicable):

rclpy

'ros2 doctor --report' output

ros2 doc --report

root@eb25a18e1572:~/chimera# ros2 doctor --report
/opt/ros/jazzy/lib/python3.12/site-packages/ros2doctor/api/init.py: 162: UserWarning: Fail to call PackageReport class functions.

NETWORK CONFIGURATION
inet : 127.0.0.1
inet4 : ['127.0.0.1']
inet6 : ['::1']
netmask : 255.0.0.0
device : lo
flags : 73<RUNNING,UP,LOOPBACK>
mtu : 65536
inet : 172.18.0.5
inet4 : ['172.18.0.5']
ether : 02:42:ac:12:00:05
netmask : 255.255.0.0
device : eth0
flags : 4163<BROADCAST,RUNNING,UP,MULTICAST>
mtu : 1500
broadcast : 172.18.255.255

PLATFORM INFORMATION
system : Linux
platform info : Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
release : 5.15.167.4-microsoft-standard-WSL2
processor : x86_64

QOS COMPATIBILITY LIST
compatibility status : No publisher/subscriber pairs found

RMW MIDDLEWARE
middleware name : rmw_fastrtps_cpp

ROS 2 INFORMATION
distribution name : jazzy
distribution type : ros2
distribution status : active
release platforms : {'debian': ['bookworm'], 'rhel': ['9'], 'ubuntu': ['noble']}

TOPIC LIST
topic : none
publisher count : 0
subscriber count : 0

Steps to reproduce issue

It's difficult to provide you with clear steps to reproduce this error because our system, and especially the node that faces it, consists of a bunch of services/action clients/publishes/subscribers.
But in our case, the error started to appear when we added a new subscriber who listens to the topic published with a 10Hz frequency and any time that the action client sends a goal asynchronously.
Everything is running under MultiThreadedExecutor.

Expected behavior

Action Client sends goal and receives feedback and result.

Actual behavior

Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/package/process.py", line 241, in execute
    self._execute()
  File "/home/user/package/process.py", line 320, in _execute
    result = spin_and_get_result(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/package2/spin_wrapper.py", line 49, in spin_and_get_result
    spin_until_future_complete(
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/__init__.py", line 271, in spin_until_future_complete
    executor.spin_until_future_complete(future, timeout_sec)
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 338, in spin_until_future_complete
    self.spin_once_until_future_complete(future, timeout_left)
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 919, in spin_once_until_future_complete
    self._spin_once_impl(timeout_sec, future.done)
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 890, in _spin_once_impl
    handler, entity, node = self.wait_for_ready_callbacks(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 793, in wait_for_ready_callbacks
    return next(self._cb_iter)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 719, in _wait_for_ready_callbacks
    if wt in waitables and wt.is_ready(wait_set):
                           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/action/client.py", line 232, in is_ready
    ready_entities = self._client_handle.is_ready(wait_set)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rclpy._rclpy_pybind11.RCLError: Failed to get number of ready entities for action client: wait set index for status subscription is out of bounds, at ./src/rcl_action/action_client.c:642

Additional information

I tried to debug this behaviour and found the following inconsistency.
First of all, I extended a little bit of logging:

    RCL_SET_ERROR_MSG_WITH_FORMAT_STRING("wait set index for status subscription is out of bounds (%zu > %zu)", status_index, wait_set->size_of_subscriptions);

And it gave me the following output:

rclpy._rclpy_pybind11.RCLError: Failed to get number of ready entities for action client: wait set index for status subscription is out of bounds (2 > 2), at /home/user/workspace/src/rcl_action/src/rcl_action/action_client.c:644

I looked into the function that works with these indexes:

  ret = rcl_wait_set_add_subscription(
    wait_set,
    &action_client->impl->feedback_subscription,
    &action_client->impl->wait_set_feedback_subscription_index);

And this function calls SET_ADD macros which accesses the internal index of subscriptions in the passed wait_set:

#define SET_ADD(Type) \
  RCL_CHECK_ARGUMENT_FOR_NULL(wait_set, RCL_RET_INVALID_ARGUMENT); \
  if (!wait_set->impl) { \
    RCL_SET_ERROR_MSG("wait set is invalid"); \
    return RCL_RET_WAIT_SET_INVALID; \
  } \
  RCL_CHECK_ARGUMENT_FOR_NULL(Type, RCL_RET_INVALID_ARGUMENT); \
  if (!(wait_set->impl->Type ## _index < wait_set->size_of_ ## Type ## s)) { \
    RCL_SET_ERROR_MSG(#Type "s set is full"); \
    return RCL_RET_WAIT_SET_FULL; \
  } \
  size_t current_index = wait_set->impl->Type ## _index++; \
  wait_set->Type ## s[current_index] = Type; \
  /* Set optional output argument */ \
  if (NULL != index) { \
    *index = current_index; \
  }

It lead me to the question, can it be that since we're caching the index of subscriptions from wait_set->impl into action_client->impl we have inconsistent state.

If we look into the function that raises the error:

rcl_ret_t
rcl_action_client_wait_set_get_entities_ready(
  const rcl_wait_set_t * wait_set,
  const rcl_action_client_t * action_client,

We'll see that it gets the pointer to the client and to some wait_set.

Now we need to understand if it can be, that action_client was initialized with one wait_set, but is_ready is called with another wait_set.

If we look into executor.py we can see that this function adds wait_set to waitable (action client is waitable):

              for waitable in waitables:
                    waitable.add_to_wait_set(wait_set)

and then it calls is_ready:

                for node in nodes_to_use:
                    for wt in node.waitables:
                        # Only check waitables that were added to the wait set
                        if wt in waitables and wt.is_ready(wait_set):
                            if wt.callback_group.can_execute(wt):
                                handler = self._make_handler(wt, node, self._take_waitable)
                                yielded_work = True
                                yield handler, wt, node

which is fine until we get it into multi-threading context, where it can be, that we run add_to_wait_set with one action client, but in another thread we re-wrote it with another wait_set and made subscription_index inconsistent.

If you want, we can set up a call since I have a setup to reproduce the problem.
Let me know if I can provide you with more information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backlogbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions