-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lifetime errors and memory leaks #199
base: rolling
Are you sure you want to change the base?
Conversation
898a149
to
35273d0
Compare
Signed-off-by: Tyler Weaver <[email protected]>
Signed-off-by: Tyler Weaver <[email protected]>
35273d0
to
5eea8a6
Compare
Nice work! While you're at it, maybe the ClassLoader instance itself could be stored as shared_ptr as well, and the MetaObject could keep a weak_ptr to its owning class loader instance, which would eliminate the possibility for a MetaObject to accidentally access a deleted class loader? |
This change makes so ClassLoader can only be created as a shared_ptr.
I tried this but wasn't able to get it to work because the load and unload methods need pointers to the ClassLoader which are called in the constructor destructor of class loader. For the construction side even the weak_ptr is invalid until construction finishes. 😢 For the destructor side, the shared_ptr and weak_ptr are invalid because that is why the destructor is being called. Trying to get either in the destructor or constructor throws an exception. |
Signed-off-by: Tyler Weaver <[email protected]>
Today I added a small change to this that de-impl's the AbstractMetaObjectBase. I thought the way it was implemented was weird. If the maintainers here would rather not have that change I am happy to drop that commit. I also tested building this with pluginlib and the API break in this does not affect pluginlib. I am unsure how to look up whatever other packages in ros directly depend on this package so I can test them with this change and provide patches if necesary. Does anyone know how to do a reverse dependency lookup in ROS? |
One way to do it would be to use
That should print a list of the packages it will build before starting the build, then try to build them all. If you don't want to build them, then replace |
Here is the topological order of packages in the ros2 workspace that are above class_loader:
Of these only 2 require changes. Here are the PRs: With these two changes, I was able to build the whole ros2 workspace locally for Rolling. |
The list seems to be only for a minimal ROS 2 source distro, if you want to check all of Rolling use:
|
Or if you want the transitive dependencies as well:
|
@tylerjw There are a couple of points that need addressing before this can be merged.
|
@gbiggs I will start by seeing if I can find all the breakages in the packages @jspricke pointed out and see if I can just supply a patch to each one. The API change is so small it is easy for me to create these PRs. You can see I have already done this for all the repos in the |
Starting with these direct dependencies of class_loader:
I searched through rosdep and made this repos file to extend the ros2.repos file:
The rosbag2_bag_v2_plugins package failed with this error and I also was not able to get rosdep to install the ros1 bridge:
I also couldn't figure out how to build the ros1_bridge from source as I can't install a version of ros1 on Ubuntu 22. There were no code changes needed to build the above packages in addition to the ros2.repos projects. Tomorrow I'll move on to the next set of repos. |
} | ||
|
||
ClassLoader * AbstractMetaObjectBase::getAssociatedClassLoader(size_t index) const | ||
{ | ||
return impl_->associated_class_loaders_[index]; | ||
return associated_class_loaders_[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prob should use .at()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think so, would you like to take over this PR?
@gbiggs, this has gotten stale. What is left here? Should I just close this? |
I'm unsure how to fix the conflicts on this PR as I've lost all the context of this PR since it has sat for months now and other changes have been made. I was hopeful that those changes would fix the bugs found by asan... but allas they do not. @mjcarroll said he'd review this and help it along but I'm not sure what to do next. Someone who understands this code should take ownership of it and fix it. Abandoned broken code like this is why I think we should rewrite things in Rust, because then when the code does get abandoned, it will at least not have bugs like this one in it that we are unable to fix. |
I think it's an important PR, I'm sad it didn't get merged earlier, and I'm happy to take it over if nobody else will. @jspricke is probably a better candidate to finish this, though. Can we get a commitment from @mjcarroll or @gbiggs to merge it if the tick-tock is added? That's mainly just a matter of adding a |
No one owns this code. What I mean by that is those who have the permissions to merge into here no longer have the mental context of how this code works. This means that in order to review a change like this, it takes a ton of work to re-learn how everything works. Because those people (@mjcarroll @gbiggs) are really busy they never find the time to do this work so the bug goes unfixed (even when someone else did the work to learn how this code works to fix it). They can't give ownership of this to anyone else because no one else has the time to learn how this works and own the code. Because of this, we are just stuck in this state where the code is broken; we all know it is broken, and a fix exists, yet no one can do anything about it. For this reason, I think we should just discontinue all library uses and the language it is written in. Plugins, while a useful idea, if they are a core part of our ecosystem and we can't fix known bugs, should be discarded. When this library was written, there were no better options, but now there are. We should just stop writing code in C/C++ and instead write code in a language that enforces a higher level of correctness by the compiler. All that is to say, unless those who have permission to push commits into this want to own this code and fix the bugs in it, we should just mark this as a CVE and discourage anyone from using this library in any way. |
Well that went dark fast. :0 I'll try to open a new PR, based on this one, that adds the deprecation warning. I don't know how the code works either but raw pointers are bad. |
Raw pointers are fine if you don't create use-after-free bugs with them. The problem is no one seems to be able to use them without creating a foot-gun. Good luck creating a new PR based on this one... make sure you test it with asan and tsan to make sure you are actually fixing the bugs. It took me several days of drawing diagrams and multiple attempts at a non-api-breaking refactor of this to try to fix this bug. I'm going to go write some Rust code where I know I can use dependencies that don't have problems like this. The maintainers and authors of this are not bad people, they are just trying to create nice software for robotics like the rest of us and are over-worked. I don't want to spend the time to try to relearn how this package works to rebase this change and get it to work again only for it to be ignored for another 6mo. |
Note that the reason I did remove the constructor and didn't just add a deprecation tag to it is creating a |
Here is the CI run from my fork that shows that the ASAN test passes with these changes. There is nothing here that addresses the TSAN issues. https://github.com/tylerjw/class_loader/runs/7035424671?check_suite_focus=true
This is a large-ish breaking API change as it makes the constructor of ClassLoader private. The reason for this is that there are multiple methods that return objects that retain a pointer to ClassLoader. A safe way to preserve this behavior is to convert to a static function that returns a
shared_ptr
of the object. For this, I created a new method calledClassLoader::Make
. The vast majority of lines in this PR stems from this API change.To fix the memory leaks of meta-objects I just put them in the graveyard when constructed to preserve all meta-objects for the lifetime of the process. I tried copying the shared_ptr of the meta object from the map to the graveyard in
destroyMetaObjectsForLibrary
but this results in a segfault in the destructor of the meta object because of some UB situation caused by the base type not having a virtual destructor. I tried changing the code in several ways to get around this but was not able to figure it out. I think that preserving the lifetime of all factories for the duration of the process is a small price to pay for being done with this memory leak.