A modified and analyzable version of the Autoware Reference system from Apex.AI and a modified version of default and picas-enabled rclcpp to inlcude automatic latency management for processing chains at the executor level. Uses ROS2 Humble as base ROS version.
Future reference systems could be proposed that are more complex using the same basic node building blocks within the reference_system package.
The modified system breaks down into 13 analyzable chains as follows:
A reference system is defined by:
- A platform is defined by:
- Hardware (e.g. an off-the-shelf single-board computer, embedded ECU, etc.)
- if there are multiple configurations available for such hardware, ensure it is specified
- Operating System (OS) like RT linux, QNX, etc. along with any special configurations made
- Hardware (e.g. an off-the-shelf single-board computer, embedded ECU, etc.)
- for diversity of benchmarking, all nodes can be run on a single process, or be split across multiple processes
- a fixed number of nodes
- each node with:
- a fixed number of publishers and subscribers
- a fixed processing time or a fixed publishing rate
- each node with:
- a fixed message type of fixed size to be used for every node.
With these defined attributes the reference system can be replicated across many different possible configurations to be used to benchmark each configuration against the other in a reliable and fair manner.
With this approach portable and repeatable tests can also be defined to reliably confirm if a given reference system meets the requirements.
To enable as many people as possible to replicate this reference system, the platform(s) were chosen to be easily accessible (inexpensive, high volume), have lots of documentation, large community use and will be supported well into the future.
Platforms were not chosen for performance of the reference system - we know we could run “faster” with a more powerful CPU or GPU but then it would be harder for others to validate findings and test their own configurations. Accessibility is the key here and will be considered if more platforms want to be added to this benchmark list.
Platforms:
- Raspberry Pi 4B:
- 4 GB RAM version is the assumed default
- other versions could also be tested / added by the community
- real-time linux kernel
- 4 GB RAM version is the assumed default
- Nvidia Jetson AGX Xavier
- 32 GB RAM version is assumed default
- Linux kernel 5.10.120-tegra SMP PREEMPT
- Volta Architecture GPU with unified memory
Rather than trying to write code to cover all potential variations of executors, APIs, and future features we cannot even imagine today we have chosen instead to define what we call a “reference system” based on part of a real-world system, Autoware.Auto.
The above node graph can be boiled down to only a handful of node "types" that are replicated to make this complex system:
Node Types:
- Sensor Node
- input node to system
- one publisher, zero subscribers
- publishes message cyclically at some fixed frequency
- Transform Node
- one subscriber, one publisher
- starts processing for N milliseconds after a message is received
- publishes message after processing is complete
- Fusion Node
- 2 subscribers, one publisher
- One subscriber caches input, the other triggers processing and publishing
- starts processing for N milliseconds after a message is received from a specific (configurable) subscription
- publishes message after processing is complete
- Cyclic Node
- N subscribers, one publisher
- cyclically processes all received messages since the last cycle for N milliseconds
- publishes message after processing is complete
- Command Node
- prints output stats everytime a message is received
- Intersection Node
- behaves like N transform nodes
- N subscribers, N publisher bundled together in one-to-one connections
- starts processing on connection where sample was received
- publishes message after processing is complete
These basic building-block nodes can be mixed-and-matched to create quite complex systems that replicate real-world scenarios to benchmark different configurations against each other.
The first reference system benchmark proposed is based on the Autoware.Auto lidar data pipeline as stated above and shown in the node graph image above as well.
- Autoware Reference System
- ROS2:
- Executors:
- ROS2:
Tests and dependencies will be written uniquely for each reference system.
Please go to the README.md file specific for the reference system you would like to run to view the instructions on how to set it up and run yourself.
Use -DPICAS=TRUE to build the PICAS executor. For more details, see README.md in Autoware Reference System.
Use -DLAME=TRUE to build the Latency Management executor. For more details, see README.md in Autoware Reference System.