-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak issue when loading a new world #81
Comments
Quick update: I used some memory profiler tools and something I've seen is that every time I load a new world (in my environment I decided to run the memory profiler tool through |
Great detective work! I have been trying to figure out why I keep running out of GPU memory, and this seems to be the problem. I get through about 20 route scenarios before i get the "out of memory" error... has anyone discovered a way to fix this (even if its a bit hacky)? I need to run 100+ route scenarios and with the current issue I cant get anywhere near that. |
If you don't have to run the 100 route scenarios all at once, perhaps you could do something hacky with bash scripting. I think I'd try to figure out the approximate number of times you can load a new world before you get an OOM error, and setup your code to run for that number of route scenarios (around 20 in your case). Then, you could use some bash scripting to run the code multiple times to get through all the route scenarios you need e.g. for 100 route scenarios, loop 5 times running 20 route scenarios per loop. The memory that gets used up by loading new worlds seems relatively consistent and is freed up when you finish running code, so I think looping it with a bash script should work |
Hey @aaronh65. Thanks for the information. We've also detected the issue and are trying to solve it. I'll report back when we have some answers to this issue. |
There is another source of memory leakage - checkpoint memory is not freed after the completion of each route and is reloaded in the next route. Just check if you override the destroy() method in your agent file. |
Are there any updates on this? I have been encountering the issue with memory growing on the |
I didn't see someone override the destroy() method,
so, the point on destroy method did you mean that it should destroy some things? |
I also met this problem but it's not so often, terminate called after throwing an instance of 'clmdep_msgpack::v1::type_error'
what(): std::bad_cast Or Carla terminal: terminating with uncaught exception of type clmdep_msgpack::v1::type_error: std::bad_castterminating with uncaught exception of type clmdep_msgpack::v1::type_error: std::bad_cast
Signal 6 caught.
Signal 6 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=6
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=120416 LargeMemoryPoolOffset=251552
Engine crash handling finished; re-raising signal 6 for the default handler. Good bye.
Aborted |
auto_pilot.py does not load any model checkpoint on the GPU so it does not need a destroy() method. If you are running a model on the GPU then you need to clear the checkpoint memory after every route explicitly in the code, e.g. using destroy() method. You'll encounter this problem only when running on a large number of routes (<10 shouldn't be an issue). If you want to verify this, you can try running an evaluation with say 50 routes and monitor GPU memory usage after every route. Also, we noticed this issue in CARLA 0.9.10 and earlier version of the leaderboard framework, so I don't know if this has been resolved in the newer versions. |
Thanks for pointing out. I think not only the GPU memory grown but also the CPU Memory, and according to commit in this repo, I think this problem did't solve... I tried to use python library to collect useless memory but it seems didn't help the situation. |
@glopezdiest Any updates on this ? |
Not really, we are working in a new Leaderboard 2.0, and while this issue is part of our plan to improve the leaderboard, we haven't had a chance to check it out yet. |
Hi, I'm trying to run a Reinforcement Learning experiment using the DynamicObjectCrossing example scenario on Carla 0.9.13. I've used the psutil python module as a memory profiler and I can see the RAM memory usage go up by a constant amount every-time self.client.load_world(town) is called. I've tried to del self.world before the new world is loaded in, but the problem still persists. Has anyone found a solution to this? |
Anyone has solved this issue? |
I'm having this issue training a DQN agent on different scenarios in CARLA 0.9.12? Any suggestions? |
I'm trying to create a custom RL environment for CARLA using
leaderboard_evaluator.py
as a template, but I'm running into some issues when trying to reset the custom environment (after an episode is done). The functions that load world/scenario, and cleanup after a scenario's end closely matches what's done inleaderboard_evaluator.py
(e.g. theload_and_wait_for_world
andcleanup
functions), but there's a memory leak somewhere that happens every time I reset the environment.Using a memory profiler shows that each time the environment resets, the
carla.Client
takes up more and more memory. This eventually leads to an out-of-memory error which kills the process. Is there a clean up method I'm missing/some common pitfall when resetting environments that I should resolve to stop this from happening?I can provide code if needed, but I wanted first check if this was a known issue.
The text was updated successfully, but these errors were encountered: