-
Notifications
You must be signed in to change notification settings - Fork 249
Description
Operating System:
Ubuntu 24.04
ROS version or commit hash:
jazzy
RMW implementation (if applicable):
rmw_fastrtps_cpp, rmw_cyclonedds_cpp
RMW Configuration (if applicable):
default
Client library (if applicable):
rclpy
'ros2 doctor --report' output
ros2 doc --report
root@eb25a18e1572:~/chimera# ros2 doctor --report
/opt/ros/jazzy/lib/python3.12/site-packages/ros2doctor/api/init.py: 162: UserWarning: Fail to call PackageReport class functions.
NETWORK CONFIGURATION
inet : 127.0.0.1
inet4 : ['127.0.0.1']
inet6 : ['::1']
netmask : 255.0.0.0
device : lo
flags : 73<RUNNING,UP,LOOPBACK>
mtu : 65536
inet : 172.18.0.5
inet4 : ['172.18.0.5']
ether : 02:42:ac:12:00:05
netmask : 255.255.0.0
device : eth0
flags : 4163<BROADCAST,RUNNING,UP,MULTICAST>
mtu : 1500
broadcast : 172.18.255.255
PLATFORM INFORMATION
system : Linux
platform info : Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
release : 5.15.167.4-microsoft-standard-WSL2
processor : x86_64
QOS COMPATIBILITY LIST
compatibility status : No publisher/subscriber pairs found
RMW MIDDLEWARE
middleware name : rmw_fastrtps_cpp
ROS 2 INFORMATION
distribution name : jazzy
distribution type : ros2
distribution status : active
release platforms : {'debian': ['bookworm'], 'rhel': ['9'], 'ubuntu': ['noble']}
TOPIC LIST
topic : none
publisher count : 0
subscriber count : 0
Steps to reproduce issue
It's difficult to provide you with clear steps to reproduce this error because our system, and especially the node that faces it, consists of a bunch of services/action clients/publishes/subscribers.
But in our case, the error started to appear when we added a new subscriber who listens to the topic published with a 10Hz frequency and any time that the action client sends a goal asynchronously.
Everything is running under MultiThreadedExecutor
.
Expected behavior
Action Client sends goal and receives feedback and result.
Actual behavior
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/home/user/package/process.py", line 241, in execute
self._execute()
File "/home/user/package/process.py", line 320, in _execute
result = spin_and_get_result(
^^^^^^^^^^^^^^^^^^^^
File "/home/user/package2/spin_wrapper.py", line 49, in spin_and_get_result
spin_until_future_complete(
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/__init__.py", line 271, in spin_until_future_complete
executor.spin_until_future_complete(future, timeout_sec)
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 338, in spin_until_future_complete
self.spin_once_until_future_complete(future, timeout_left)
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 919, in spin_once_until_future_complete
self._spin_once_impl(timeout_sec, future.done)
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 890, in _spin_once_impl
handler, entity, node = self.wait_for_ready_callbacks(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 793, in wait_for_ready_callbacks
return next(self._cb_iter)
^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 719, in _wait_for_ready_callbacks
if wt in waitables and wt.is_ready(wait_set):
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/workspace/install/rclpy/lib/python3.12/site-packages/rclpy/action/client.py", line 232, in is_ready
ready_entities = self._client_handle.is_ready(wait_set)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rclpy._rclpy_pybind11.RCLError: Failed to get number of ready entities for action client: wait set index for status subscription is out of bounds, at ./src/rcl_action/action_client.c:642
Additional information
I tried to debug this behaviour and found the following inconsistency.
First of all, I extended a little bit of logging:
RCL_SET_ERROR_MSG_WITH_FORMAT_STRING("wait set index for status subscription is out of bounds (%zu > %zu)", status_index, wait_set->size_of_subscriptions);
And it gave me the following output:
rclpy._rclpy_pybind11.RCLError: Failed to get number of ready entities for action client: wait set index for status subscription is out of bounds (2 > 2), at /home/user/workspace/src/rcl_action/src/rcl_action/action_client.c:644
I looked into the function that works with these indexes:
ret = rcl_wait_set_add_subscription(
wait_set,
&action_client->impl->feedback_subscription,
&action_client->impl->wait_set_feedback_subscription_index);
And this function calls SET_ADD
macros which accesses the internal index of subscriptions in the passed wait_set
:
#define SET_ADD(Type) \
RCL_CHECK_ARGUMENT_FOR_NULL(wait_set, RCL_RET_INVALID_ARGUMENT); \
if (!wait_set->impl) { \
RCL_SET_ERROR_MSG("wait set is invalid"); \
return RCL_RET_WAIT_SET_INVALID; \
} \
RCL_CHECK_ARGUMENT_FOR_NULL(Type, RCL_RET_INVALID_ARGUMENT); \
if (!(wait_set->impl->Type ## _index < wait_set->size_of_ ## Type ## s)) { \
RCL_SET_ERROR_MSG(#Type "s set is full"); \
return RCL_RET_WAIT_SET_FULL; \
} \
size_t current_index = wait_set->impl->Type ## _index++; \
wait_set->Type ## s[current_index] = Type; \
/* Set optional output argument */ \
if (NULL != index) { \
*index = current_index; \
}
It lead me to the question, can it be that since we're caching the index of subscriptions from wait_set->impl
into action_client->impl
we have inconsistent state.
If we look into the function that raises the error:
rcl_ret_t
rcl_action_client_wait_set_get_entities_ready(
const rcl_wait_set_t * wait_set,
const rcl_action_client_t * action_client,
We'll see that it gets the pointer to the client and to some wait_set
.
Now we need to understand if it can be, that action_client
was initialized with one wait_set
, but is_ready
is called with another wait_set
.
If we look into executor.py
we can see that this function adds wait_set
to waitable
(action client is waitable):
for waitable in waitables:
waitable.add_to_wait_set(wait_set)
and then it calls is_ready
:
for node in nodes_to_use:
for wt in node.waitables:
# Only check waitables that were added to the wait set
if wt in waitables and wt.is_ready(wait_set):
if wt.callback_group.can_execute(wt):
handler = self._make_handler(wt, node, self._take_waitable)
yielded_work = True
yield handler, wt, node
which is fine until we get it into multi-threading context, where it can be, that we run add_to_wait_set
with one action client, but in another thread we re-wrote it with another wait_set
and made subscription_index
inconsistent.
If you want, we can set up a call since I have a setup to reproduce the problem.
Let me know if I can provide you with more information.