A GPU acceleration flow for RTL simulation with batch stimulus.
RTLflow is a GPU acceleration flow for RTL simulation with batch stimulus. RTLflow first transpiles RTL into CUDA kernels that each simulate a partition of the RTL simultaneously across multiple stimulus. It also leverages CUDA Graph for efficient runtime execution. We build RTLflow atop Verilator to inherit its existing optimization facilities, such as variable reduction and partitioning algorithms, that have been rigorously tested for over 25 years in the Verilator community.
~$ cd RTLflow
~/RTLflow$ autoconf
~/RTLflow$ ./configure
~/RTLflow$ make -j8To use RTLflow, you need:
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.0 with -std=c++17.
 - GNU C++ Compiler at least v5.0 with -std=c++17.
 - libfl-dev
 
~$ nvcc --version    # NVCC must present with version at least v11.0
~$ g++ --version     # GNU must present with version at least v8.0
~$ sudo apt install libfl-devYou will also need to set $VERILATOR_ROOT to RTLflow root directory before using RTLflow. For example:
~$ export VERILATOR_ROOT=~/RTLflowBy default, we set nvcc flag --arch=sm_80 to achieve the best performance under our enviornment. You can go to:
~/RTLflow/include/verilated.mk.into modify $RTLFLOW_FLAGS and make RTLflow again.
Please go to RTLflow benchmarks for more examples.
RTLflow is licensed with the MIT License.
