-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
With larger and larger state machines (10k - 1Mio states) we are running into notable problems regarding the loading time performance and, more importantly, memory usage. As it can be expected that the number of states will even increase for future autonomous tasks, this is important to tackle.
As the RAM of the robots is limited on most systems, consuming multiple GBs of memory for only loading a large state machine is problematic.
The real issue is, that the memory consumption is 100-1000 times higher than the inherent information, regarding just the .json files loaded.
To my current understanding, this is due to the object structure that is set up in python variables/references during runtime.
General Questions:
- How to reduce the overhead of a loaded state machine?
- Is a bug causing the high memory consumption?
- Is the recursive nesting structure of states (which holds loads of redundant data) in the python variables the problem?
Connected to this issue might be the following topics:
- Memory consumption to high for longer runtimes
- Load time improvements
- Reduce SM loading time
- Support flat storage for single-state machines
- Wrong interpretation of MAX_VISIBLE_LIBRARY_HIERARCHY
- Preload libraries and use copy method to provide LibraryStates and respective models
We already had the following findings:
- Using just the core (and no gui) to open large state machines does not improve the performance. Loading a demo state machine of ~30k states takes around 30s to load when using the gui and also when using just the core. Also the memory (RAM) usage is similar at around 2.5 - 3GB.
- When loading the files locally from a laptop in offline mode that is not connected to the institute network (i.e. retrieving data from the server), there are no changes in performance. Therefore, the network infrastructure does not cause significant performance issues.
- Upon further investigation of the state machine load function, we found that the majority of the time is consumed by reading the files. The reader called recursively in load_state_recursively.
- Read .json files (~70-90%)
- RAFCON logger commands (~10%)
- Other (<10%)
- The number of hierarchy levels is not significantly changing the load time, but the number of state machines is. For example: Loading 24 state machines in 2 hierarchies takes approximately the same time as loading 24 state machines in 7 hierarchy levels but both are slower than loading 48 state machines. However, the number of hierarchy levels is significantly impacting the visual editors performance: I.e. the more hierarchies, the worse the performance of panning, zooming, adding, shifting...
- When investigating the memory usage of a large state machine we had the following findings:
- State machines by themselves (when not loaded, just the
.json) are rather tiny, whole state machines has 65MB - Loading takes loads of memory, 2-3GB for (with core and GUI)
- When closing the state machines, the memory is not released, only when closing RAFCON
- State machines by themselves (when not loaded, just the
- Use pympler for getting the size of objects:
- Before loading the large state machine:
- global_gui_config: 0.97 MB
- global_runtime_config: 0.97 MB
- library_manager: 0.97 MB
- library_manager_model: 0.97 MB
- state_machine_execution_engine: 0.97 MB
- state_machine_manager: 0.97 MB
- storage: 1.00 MB
- After loading the state machine:
- state_machine: 146 MB (the only non-global variable in this list)
- global_gui_config: 761.98 MB
- global_runtime_config: 734.87 MB
- library_manager: 802.64 MB
- state_machine_execution_engine: 761.98 MB (sketchy, why exactly the same? --> Down the nesting it's referring to the same objects)
- state_machine_manager: 761.98 MB
- storage: 725.30 MB
- It shows that when loaded, many variables are consuming a lot of memory. However, they are often referring to the same objects in lower levels of their nesting, therefore this listing does not represent the absolute memory consumption.
- The main bulge of memory is in
rafcon_singletonsandthreadingleading to thestate objects - The recursive structure holds loads of redundant information
- Structures like "...state_copy.parent.state_copy.parent.state_copy.parent..." exist
- Before loading the large state machine:
- The current guess is when loading, the memory actually increases linearly depending on the amount of states (see image below). So the problem probably is rather that a single instance of a state is way to large in comparison to the data it is holding.
