-
Notifications
You must be signed in to change notification settings - Fork 767
Description
For the reference - the issue started in apriltag_ros, but it seems the problem is more connected with RectifyNode. I attach the link to the issue with description: christianrauch/apriltag_ros#33 (comment)
Hi @christianrauch. I also encountered the problem mentioned in the issue. I tested multiple setups that I'll summarize below. I also cannot see how the approximate sync policy could help. It seems to be workaround without understanding, rather than fix. Below I show more details regarding my setup, so pleas forgive me talking about other nodes also (which does not seem to be an issue eventually) - I wanted to deep dive into the problem which is important for me.
I use OAK-D-PRO-W-PoE camera with enabled rgb stream only. My setup is simply:
oak_driver -> image_proc::RectifyNode -> apriltag_nodeMy environment:
ROS 2 Jazzy
apriltag_rosOAK camera driver publishing:
/oak/oak/rgb/image_raw@ ~20 Hz/oak/oak/rgb/camera_info@ ~20 HzRectification via
image_proc::RectifyNode
- publishes
/oak/oak/rgb/image_rectIn some tests oak driver and
rectify_nodeare in the samecomponent_containerwithapriltag_node, in otherapriltag_nodeis a separate process. Below I introduce more details.Bug description
Simple as that:
apriltag_nodestops matchingimage_rect+camera_infoafter ~20–22 minutes.Summary
After ~20 minutes of runtime,
apriltag_nodestarts printing:[WARN ...] [apriltag_node]: [image_transport] Topics '/oak/oak/rgb/image_rect' and '/oak/oak/rgb/camera_info' do not appear to be synchronized. In the last 10s: Image messages received: 1 CameraInfo messages received: 20 Synchronized pairs: 0
apriltag_nodeprints repeated synchronization warnings- Rate of images arriving to
apriltag_nodedrops (observed in logs /ros2 topic hz)- No tags detected
- Restarting camera container resets the issue temporarily
In some runs (when apriltag_node runs in the component container with rectify node + driver), both streams dropped to 1:
Image messages received: 1 CameraInfo messages received: 1 Synchronized pairs: 0Investigation Summary
1) Frequency observations
Before failure:
/oak/oak/rgb/image_raw~20 Hz ✅/oak/oak/rgb/camera_info~20 Hz ✅/oak/oak/rgb/image_rect~20 Hz ✅After failure:
/oak/oak/rgb/image_raw~20 Hz ✅/oak/oak/rgb/camera_info~20 Hz ✅/oak/oak/rgb/image_rectdrops to ~5 Hz → ~1 Hz ❌ obviously i suspect that rectify node is a problem2) Timestamp verification
image_rawandcamera_infofrom the camera have identical timestamps for long periods (I sampled ~16 minutes spanning before and after the failure).- Therefore, the camera seems to publish synchronized data correctly.
3) QoS verification
Initial hypothesis was QoS mismatch.
Observed initially:
- Camera publishes
RELIABLE / KEEP_LAST(10)rectify_color_nodesubscribed withBEST_EFFORT / KEEP_LAST(5)(later forced toRELIABLE / KEEP_LAST(50)to test)rectify_color_nodepublishesimage_rectasRELIABLE / KEEP_LAST(10)apriltag_nodeoriginally subscribedRELIABLE / KEEP_LAST(10)Changing QoS on rectify subscriptions to
RELIABLEdid not remove the failure.4) RectifyNode callback counters
I added counters to the
RectifyNodecallbacks; it reports:
images_received == camera_infos_received == pairs_matched- meaning it does not lose sync internally (it only processes matched pairs)
- BUT after the failure begins, it receives fewer and fewer pairs over time (its callback rate declines),
even though upstream raw topics remain at ~20 Hz.Example:
RectifyNode received 27571 images, 27571 camera infos, 27571 pairsWhen bug happens: RectifyNode still shows equal counts, but callback rate decreases
I don't have more ideas to debug rectify node, so at this point I assume it works OK to move forward. It just passes processed data and might be blocked by downstream packages (
apriltag_node).5)
apriltag_nodebehavior during failure:During failure:
apriltag_nodereceivescamera_infoat full rate (often still ~20 Hz)apriltag_nodereceivesimage_rectincreasingly rarely (20 → 5 → 1)- Therefore,
ExactTimeinimage_transport::CameraSubscribercannot form pairs and prints warnings.This suggests the problem is not purely “timestamps mismatch”, but rather image delivery/throughput collapses (?).
Why this points to
apriltag_node
apriltag_nodeusesimage_transport::CameraSubscriber, which usesmessage_filters::TimeSynchronizer(ExactTime policy). It subscribes toimage_rect, then derives the siblingcamera_infotopic automatically.When the node falls behind (CPU spikes, processing delays, executor contention), image delivery can get throttled, and
ExactTimepair matching drops to zero even ifcamera_infostill arrives.Additionally, in component-container setups (single-thread executor), starvation could make both streams appear to stop - don;t know why.
TLDR
- Moving
apriltag_nodeto a separate process improved behavior, but the issue could still reappear (~22 minutes)- Verified
image_rawandcamera_infostamps stay identical - upstream is fine then- Verified QoS changes on rectify subscriber (
RELIABLE / KEEP_LAST(50)) did not fix the issue- Set
apriltag_nodesubscriber to QoSsensor_data- no effect- Reduce processing cost of
apriltag_node(decimate, more threads) - no effect
For now I'm testing the setup in which I modified RectifyNode to:
- subscribe images as BEST_EFFORT always to attempt to avoid plugging of the pipeline
- re-publish
camera_infoevery time whenimage_rectis published to avoid any de-synchronization issues in downstream packages - make my downstream nodes consume re-published
camera_infotopic
I would appreciate any comments or suggestions to recover from this issue.