Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lifetime errors and memory leaks #199

Open
wants to merge 3 commits into
base: rolling
Choose a base branch
from

Conversation

tylerjw
Copy link

@tylerjw tylerjw commented Jun 24, 2022

Here is the CI run from my fork that shows that the ASAN test passes with these changes. There is nothing here that addresses the TSAN issues. https://github.com/tylerjw/class_loader/runs/7035424671?check_suite_focus=true

This is a large-ish breaking API change as it makes the constructor of ClassLoader private. The reason for this is that there are multiple methods that return objects that retain a pointer to ClassLoader. A safe way to preserve this behavior is to convert to a static function that returns a shared_ptr of the object. For this, I created a new method called ClassLoader::Make. The vast majority of lines in this PR stems from this API change.

To fix the memory leaks of meta-objects I just put them in the graveyard when constructed to preserve all meta-objects for the lifetime of the process. I tried copying the shared_ptr of the meta object from the map to the graveyard in destroyMetaObjectsForLibrary but this results in a segfault in the destructor of the meta object because of some UB situation caused by the base type not having a virtual destructor. I tried changing the code in several ways to get around this but was not able to figure it out. I think that preserving the lifetime of all factories for the duration of the process is a small price to pay for being done with this memory leak.

@roehling
Copy link

Nice work! While you're at it, maybe the ClassLoader instance itself could be stored as shared_ptr as well, and the MetaObject could keep a weak_ptr to its owning class loader instance, which would eliminate the possibility for a MetaObject to accidentally access a deleted class loader?

@tylerjw
Copy link
Author

tylerjw commented Jun 24, 2022

Nice work! While you're at it, maybe the ClassLoader instance itself could be stored as shared_ptr as well.

This change makes so ClassLoader can only be created as a shared_ptr.

the MetaObject could keep a weak_ptr to its owning class loader instance, which would eliminate the possibility for a MetaObject to accidentally access a deleted class loader?

I tried this but wasn't able to get it to work because the load and unload methods need pointers to the ClassLoader which are called in the constructor destructor of class loader. For the construction side even the weak_ptr is invalid until construction finishes. 😢

For the destructor side, the shared_ptr and weak_ptr are invalid because that is why the destructor is being called. Trying to get either in the destructor or constructor throws an exception.

@tylerjw
Copy link
Author

tylerjw commented Jun 26, 2022

Today I added a small change to this that de-impl's the AbstractMetaObjectBase. I thought the way it was implemented was weird. If the maintainers here would rather not have that change I am happy to drop that commit.

I also tested building this with pluginlib and the API break in this does not affect pluginlib. I am unsure how to look up whatever other packages in ros directly depend on this package so I can test them with this change and provide patches if necesary. Does anyone know how to do a reverse dependency lookup in ROS?

@gbiggs
Copy link
Contributor

gbiggs commented Jun 26, 2022

One way to do it would be to use colcon in a workspace containing the ROS 2 source checked out from ros2.repos:

colcon build --event-handlers console_package_list+ --packages-above class_loader

That should print a list of the packages it will build before starting the build, then try to build them all. If you don't want to build them, then replace build with list.

@tylerjw
Copy link
Author

tylerjw commented Jun 28, 2022

Here is the topological order of packages in the ros2 workspace that are above class_loader:

Topological order    
- class_loader (ros.ament_cmake)
- pluginlib (ros.ament_cmake)
- qt_gui_cpp (ros.ament_cmake)
- urdf (ros.ament_cmake)
- kdl_parser (ros.ament_cmake)
- qt_gui_core (ros.ament_cmake)
- rclcpp_components (ros.ament_cmake)
- rqt_gui_cpp (ros.ament_cmake)
- action_tutorials_cpp (ros.ament_cmake)
- examples_rclcpp_minimal_composition (ros.ament_cmake)
- examples_rclcpp_minimal_subscriber (ros.ament_cmake)
- examples_rclcpp_wait_set (ros.ament_cmake)
- quality_of_service_demo_cpp (ros.ament_cmake)
- rosbag2_storage (ros.ament_cmake)
- composition (ros.ament_cmake)
- demo_nodes_cpp (ros.ament_cmake)
- demo_nodes_cpp_native (ros.ament_cmake)
- image_tools (ros.ament_cmake)
- image_transport (ros.ament_cmake)
- logging_demo (ros.ament_cmake)
- rosbag2_storage_default_plugins (ros.ament_cmake)
- rqt (ros.ament_python)
- tf2_ros (ros.ament_cmake)
- image_common (ros.ament_cmake)
- robot_state_publisher (ros.ament_cmake)
- rosbag2_cpp (ros.ament_cmake)
- tf2_bullet (ros.ament_cmake)
- tf2_eigen (ros.ament_cmake)
- tf2_geometry_msgs (ros.ament_cmake)
- tf2_kdl (ros.ament_cmake)
- tf2_sensor_msgs (ros.ament_cmake)
- dummy_robot_bringup (ros.ament_cmake)
- geometry2 (ros.ament_cmake)
- interactive_markers (ros.ament_cmake)
- ros2component (ros.ament_python)
- rosbag2_compression (ros.ament_cmake)
- rviz_common (ros.ament_cmake)
- test_tf2 (ros.ament_cmake)
- ros2cli_common_extensions (ros.ament_cmake)
- rosbag2_compression_zstd (ros.ament_cmake)
- rosbag2_performance_benchmarking (ros.ament_cmake)
- rviz_visual_testing_framework (ros.ament_cmake)
- test_launch_ros (ros.ament_python)
- rosbag2_transport (ros.ament_cmake)
- rviz_default_plugins (ros.ament_cmake)
- rosbag2_py (ros.ament_cmake)
- rviz2 (ros.ament_cmake)
- ros2bag (ros.ament_python)
- rqt_bag (ros.ament_python)
- launch_testing_examples (ros.ament_python)
- rosbag2_tests (ros.ament_cmake)
- rosbag2 (ros.ament_cmake)
- rqt_bag_plugins (ros.ament_python)

Of these only 2 require changes. Here are the PRs:

With these two changes, I was able to build the whole ros2 workspace locally for Rolling.

@jspricke
Copy link
Member

jspricke commented Jun 28, 2022

The list seems to be only for a minimal ROS 2 source distro, if you want to check all of Rolling use:

$ sudo apt install curl dctrl-tools
$ curl -s "http://packages.ros.org/ros2/ubuntu/dists/jammy/main/source/Sources.gz" | gunzip | grep-dctrl -s Package -F Build-Depends,Build-Depends-Indep ros-rolling-pluginlib
Package: ros-rolling-cartographer-rviz
Package: ros-rolling-controller-manager
Package: ros-rolling-diagnostic-aggregator
Package: ros-rolling-diff-drive-controller
Package: ros-rolling-effort-controllers
Package: ros-rolling-filters
Package: ros-rolling-force-torque-sensor-broadcaster
Package: ros-rolling-forward-command-controller
Package: ros-rolling-gazebo-ros2-control
Package: ros-rolling-gripper-controllers
Package: ros-rolling-hardware-interface
Package: ros-rolling-image-transport
Package: ros-rolling-imu-sensor-broadcaster
Package: ros-rolling-joint-state-broadcaster
Package: ros-rolling-joint-trajectory-controller
Package: ros-rolling-laser-filters
Package: ros-rolling-mavros
Package: ros-rolling-mavros-extras
Package: ros-rolling-moveit-chomp-optimizer-adapter
Package: ros-rolling-moveit-core
Package: ros-rolling-moveit-hybrid-planning
Package: ros-rolling-moveit-kinematics
Package: ros-rolling-moveit-planners-chomp
Package: ros-rolling-moveit-planners-ompl
Package: ros-rolling-moveit-resources-prbt-ikfast-manipulator-plugin
Package: ros-rolling-moveit-ros-benchmarks
Package: ros-rolling-moveit-ros-control-interface
Package: ros-rolling-moveit-ros-move-group
Package: ros-rolling-moveit-ros-occupancy-map-monitor
Package: ros-rolling-moveit-ros-perception
Package: ros-rolling-moveit-ros-planning
Package: ros-rolling-moveit-ros-visualization
Package: ros-rolling-moveit-servo
Package: ros-rolling-moveit-simple-controller-manager
Package: ros-rolling-nmea-hardware-interface
Package: ros-rolling-pilz-industrial-motion-planner
Package: ros-rolling-position-controllers
Package: ros-rolling-qt-gui-cpp
Package: ros-rolling-rmf-visualization-rviz2-plugins
Package: ros-rolling-ros1-rosbag-storage-vendor
Package: ros-rolling-rosbag2-bag-v2-plugins
Package: ros-rolling-rosbag2-compression-zstd
Package: ros-rolling-rosbag2-cpp
Package: ros-rolling-rosbag2-storage
Package: ros-rolling-rosbag2-storage-default-plugins
Package: ros-rolling-rosbag2-storage-mcap
Package: ros-rolling-rqt-gui-cpp
Package: ros-rolling-rqt-image-overlay
Package: ros-rolling-rqt-image-overlay-layer
Package: ros-rolling-rviz-common
Package: ros-rolling-rviz-default-plugins
Package: ros-rolling-rviz-imu-plugin
Package: ros-rolling-rviz-visual-tools
Package: ros-rolling-snowbot-operating-system
Package: ros-rolling-theora-image-transport
Package: ros-rolling-transmission-interface
Package: ros-rolling-urdf
Package: ros-rolling-velocity-controllers
Package: ros-rolling-warehouse-ros
Package: ros-rolling-webots-ros2-control
Package: ros-rolling-webots-ros2-driver

@jspricke
Copy link
Member

jspricke commented Jun 28, 2022

Or if you want the transitive dependencies as well:

$ sudo apt install wget dose-extra
$ wget "http://packages.ros.org/ros2/ubuntu/dists/jammy/main/source/Sources.gz" "http://packages.ros.org/ros2/ubuntu/dists/jammy/main/binary-amd64/Packages.gz"
$ dose-ceve -T debsrc -r ros-rolling-pluginlib -G pkg --deb-native-arch=amd64 deb://Packages.gz debsrc://Sources.gz | grep Package:
Package: ros-rolling-robot-state-publisher
Package: ros-rolling-depth-image-proc
Package: ros-rolling-compressed-image-transport
Package: ros-rolling-rviz2
Package: ros-rolling-color-names
Package: ros-rolling-moveit-ros-control-interface
Package: ros-rolling-chomp-motion-planner
Package: ros-rolling-swri-image-util
Package: ros-rolling-ros2bag
Package: ros-rolling-usb-cam
Package: ros-rolling-imu-sensor-broadcaster
Package: ros-rolling-urdf
Package: ros-rolling-image-rotate
Package: ros-rolling-effort-controllers
Package: ros-rolling-moveit-planners-chomp
Package: ros-rolling-rqt-image-overlay
Package: ros-rolling-qt-gui-cpp
Package: ros-rolling-rviz-default-plugins
Package: ros-rolling-moveit-ros-visualization
Package: ros-rolling-controller-manager
Package: ros-rolling-webots-ros2-tests
Package: ros-rolling-rviz-visual-testing-framework
Package: ros-rolling-controller-interface
Package: ros-rolling-image-publisher
Package: ros-rolling-hardware-interface
Package: ros-rolling-plotjuggler-ros
Package: ros-rolling-gripper-controllers
Package: ros-rolling-rosbag2-performance-benchmarking
Package: ros-rolling-webots-ros2-control
Package: ros-rolling-compressed-depth-image-transport
Package: ros-rolling-gazebo-plugins
Package: ros-rolling-force-torque-sensor-broadcaster
Package: ros-rolling-rviz-imu-plugin
Package: ros-rolling-image-proc
Package: ros-rolling-rosbag2-tests
Package: ros-rolling-rosbag2-compression
Package: ros-rolling-image-view
Package: ros-rolling-turtlebot3-gazebo
Package: ros-rolling-moveit-ros-warehouse
Package: ros-rolling-moveit-resources-prbt-ikfast-manipulator-plugin
Package: ros-rolling-rviz-visual-tools
Package: ros-rolling-webots-ros2-driver
Package: ros-rolling-pose-cov-ops
Package: ros-rolling-moveit-chomp-optimizer-adapter
Package: ros-rolling-rosbag2-bag-v2-plugins
Package: ros-rolling-warehouse-ros
Package: ros-rolling-moveit-ros-benchmarks
Package: ros-rolling-forward-command-controller
Package: ros-rolling-image-transport
Package: ros-rolling-rosbag2-py
Package: ros-rolling-warehouse-ros-sqlite
Package: ros-rolling-moveit-planners-ompl
Package: ros-rolling-ros2controlcli
Package: ros-rolling-transmission-interface
Package: ros-rolling-gscam
Package: ros-rolling-rmf-visualization-rviz2-plugins
Package: ros-rolling-mrpt2
Package: ros-rolling-gazebo-ros2-control
Package: ros-rolling-moveit-ros-robot-interaction
Package: ros-rolling-velocity-controllers
Package: ros-rolling-moveit-core
Package: ros-rolling-moveit-ros-perception
Package: ros-rolling-rosbag2-storage-mcap
Package: ros-rolling-moveit-simple-controller-manager
Package: ros-rolling-rqt-image-view
Package: ros-rolling-moveit-servo
Package: ros-rolling-diff-drive-controller
Package: ros-rolling-moveit-visual-tools
Package: ros-rolling-rosbag2-transport
Package: ros-rolling-srdfdom
Package: ros-rolling-rqt-image-overlay-layer
Package: ros-rolling-ros-ign-image
Package: ros-rolling-rosbag2-storage
Package: ros-rolling-mavros-extras
Package: ros-rolling-laser-filters
Package: ros-rolling-rtabmap
Package: ros-rolling-moveit-hybrid-planning
Package: ros-rolling-moveit-kinematics
Package: ros-rolling-position-controllers
Package: ros-rolling-joint-trajectory-controller
Package: ros-rolling-octomap-rviz-plugins
Package: ros-rolling-mavros
Package: ros-rolling-rc-genicam-driver
Package: ros-rolling-rosbag2-compression-zstd
Package: ros-rolling-v4l2-camera
Package: ros-rolling-cartographer-rviz
Package: ros-rolling-ros1-rosbag-storage-vendor
Package: ros-rolling-kdl-parser
Package: ros-rolling-snowbot-operating-system
Package: ros-rolling-pilz-industrial-motion-planner
Package: ros-rolling-theora-image-transport
Package: ros-rolling-rosbag2-storage-default-plugins
Package: ros-rolling-filters
Package: ros-rolling-moveit-ros-planning-interface
Package: ros-rolling-cartographer-ros
Package: ros-rolling-stereo-image-proc
Package: ros-rolling-joint-state-broadcaster
Package: ros-rolling-rviz-common
Package: ros-rolling-moveit-setup-assistant
Package: ros-rolling-nmea-hardware-interface
Package: ros-rolling-rosbag2-cpp
Package: ros-rolling-launch-testing-examples
Package: ros-rolling-rqt-gui-cpp
Package: ros-rolling-avt-vimba-camera
Package: ros-rolling-pilz-industrial-motion-planner-testutils
Package: ros-rolling-moveit-ros-planning
Package: ros-rolling-diagnostic-aggregator
Package: ros-rolling-domain-bridge
Package: ros-rolling-moveit-ros-move-group
Package: ros-rolling-moveit-ros-occupancy-map-monitor

@audrow audrow changed the base branch from ros2 to rolling June 28, 2022 14:17
@gbiggs
Copy link
Contributor

gbiggs commented Jun 30, 2022

@tylerjw There are a couple of points that need addressing before this can be merged.

  1. Since this is a significant API change, we should do a tick-tock. This means we deprecate the old API but keep it around (users will get a warning when they use it), and add the new API. That goes into Iron Irwini. Then in the J-Turtle release we can remove the old API.
  2. We should at least try out all the reverse dependencies found by @jspricke above so we know how wide the breakage is. We are not required to go and fix all of them but it's good to know. We should fix those in the ros2.repos file, though. If we find the breakage is very small, we might be able to waive the tick-tock requirement.

@tylerjw
Copy link
Author

tylerjw commented Jun 30, 2022

@gbiggs I will start by seeing if I can find all the breakages in the packages @jspricke pointed out and see if I can just supply a patch to each one. The API change is so small it is easy for me to create these PRs. You can see I have already done this for all the repos in the ros2.repos file and there were two that I needed to create PRs for. They are linked to above.

@tylerjw
Copy link
Author

tylerjw commented Jun 30, 2022

Starting with these direct dependencies of class_loader:

curl -s "http://packages.ros.org/ros2/ubuntu/dists/jammy/main/source/Sources.gz" | gunzip | grep-dctrl -s Package -F Build-Depends,Build-Depends-Indep ros-rolling-class-loader
Package: ros-rolling-depth-image-proc
Package: ros-rolling-gscam
Package: ros-rolling-image-rotate
Package: ros-rolling-laser-proc
Package: ros-rolling-moveit-kinematics
Package: ros-rolling-moveit-ros-visualization
Package: ros-rolling-pluginlib
Package: ros-rolling-rclcpp-components
Package: ros-rolling-rosbag2-bag-v2-plugins
Package: ros-rolling-warehouse-ros-sqlite

I searched through rosdep and made this repos file to extend the ros2.repos file:

repositories:
  image_pipeline:
    type: git
    url: https://github.com/ros-perception/image_pipeline.git
    version: rolling
  gscam:
    type: git
    url: https://github.com/ros-drivers/gscam.git
    version: ros2
  laser_proc:
    type: git
    url: https://github.com/ros-perception/laser_proc.git
    version: ros2-devel
  moveit:
    type: git
    url: https://github.com/ros-planning/moveit2.git
    version: main
  rosbag2_bag_v2_plugins:
    type: git
    url: https://github.com/ros2/rosbag2_bag_v2.git
    version: master
  warehouse_ros_sqlite:
    type: git
    url: https://github.com/ros-planning/warehouse_ros_sqlite.git
    version: ros2

The rosbag2_bag_v2_plugins package failed with this error and I also was not able to get rosdep to install the ros1 bridge:

CMake Error at CMakeLists.txt:23 (message):
  Failed to find ros1_bridge, cannot build.

I also couldn't figure out how to build the ros1_bridge from source as I can't install a version of ros1 on Ubuntu 22.

There were no code changes needed to build the above packages in addition to the ros2.repos projects. Tomorrow I'll move on to the next set of repos.

}

ClassLoader * AbstractMetaObjectBase::getAssociatedClassLoader(size_t index) const
{
return impl_->associated_class_loaders_[index];
return associated_class_loaders_[index];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob should use .at()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think so, would you like to take over this PR?

@tylerjw
Copy link
Author

tylerjw commented Oct 21, 2022

@gbiggs, this has gotten stale. What is left here? Should I just close this?

@tylerjw
Copy link
Author

tylerjw commented Oct 21, 2022

I'm unsure how to fix the conflicts on this PR as I've lost all the context of this PR since it has sat for months now and other changes have been made. I was hopeful that those changes would fix the bugs found by asan... but allas they do not.

@mjcarroll said he'd review this and help it along but I'm not sure what to do next. Someone who understands this code should take ownership of it and fix it. Abandoned broken code like this is why I think we should rewrite things in Rust, because then when the code does get abandoned, it will at least not have bugs like this one in it that we are unable to fix.

@AndyZe
Copy link

AndyZe commented Oct 22, 2022

would you like to take over this PR?

I think it's an important PR, I'm sad it didn't get merged earlier, and I'm happy to take it over if nobody else will. @jspricke is probably a better candidate to finish this, though.

Can we get a commitment from @mjcarroll or @gbiggs to merge it if the tick-tock is added? That's mainly just a matter of adding a deprecated attribute, right?

@tylerjw
Copy link
Author

tylerjw commented Oct 22, 2022

Can we get a commitment from @mjcarroll or @gbiggs to merge it if the tick-tock is added? That's mainly just a matter of adding a deprecated attribute, right?

No one owns this code. What I mean by that is those who have the permissions to merge into here no longer have the mental context of how this code works. This means that in order to review a change like this, it takes a ton of work to re-learn how everything works. Because those people (@mjcarroll @gbiggs) are really busy they never find the time to do this work so the bug goes unfixed (even when someone else did the work to learn how this code works to fix it).

They can't give ownership of this to anyone else because no one else has the time to learn how this works and own the code.

Because of this, we are just stuck in this state where the code is broken; we all know it is broken, and a fix exists, yet no one can do anything about it.

For this reason, I think we should just discontinue all library uses and the language it is written in. Plugins, while a useful idea, if they are a core part of our ecosystem and we can't fix known bugs, should be discarded.

When this library was written, there were no better options, but now there are. We should just stop writing code in C/C++ and instead write code in a language that enforces a higher level of correctness by the compiler.

All that is to say, unless those who have permission to push commits into this want to own this code and fix the bugs in it, we should just mark this as a CVE and discourage anyone from using this library in any way.

@AndyZe
Copy link

AndyZe commented Oct 22, 2022

Well that went dark fast. :0

I'll try to open a new PR, based on this one, that adds the deprecation warning. I don't know how the code works either but raw pointers are bad.

@tylerjw
Copy link
Author

tylerjw commented Oct 22, 2022

I'll try to open a new PR, based on this one, that adds the deprecation warning. I don't know how the code works either but raw pointers are bad.

Raw pointers are fine if you don't create use-after-free bugs with them. The problem is no one seems to be able to use them without creating a foot-gun.

Good luck creating a new PR based on this one... make sure you test it with asan and tsan to make sure you are actually fixing the bugs. It took me several days of drawing diagrams and multiple attempts at a non-api-breaking refactor of this to try to fix this bug.

I'm going to go write some Rust code where I know I can use dependencies that don't have problems like this. The maintainers and authors of this are not bad people, they are just trying to create nice software for robotics like the rest of us and are over-worked.

I don't want to spend the time to try to relearn how this package works to rebase this change and get it to work again only for it to be ignored for another 6mo.

@tylerjw
Copy link
Author

tylerjw commented Oct 22, 2022

Note that the reason I did remove the constructor and didn't just add a deprecation tag to it is creating a ClassLoader without a reference counter shared_ptr around it creates a lifetime that is not accounted for. I do not believe this to be a major API break as from my searching, most users use this through Pluginlib, and this change is invisible to them. The two cases where I found ClassLoader used directly were in rclcpp and the demos for rclcpp. I provided patches for those libraries that can be applied at the same time as this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants