In this work, we conduct a detailed analysis of deep learning models to pinpoint modules with the highest computational overhead. To mitigate these bottlenecks, we design a custom SIMD (Single Instruction Multiple Data) instruction set, specifically designed to accelerate the computation of these performance-critical modules. By leveraging our tailored SIMD instructions, we achieve significant reductions in computational time, surpassing the performance of conventional serial implementations.
- For detailed technical information, please refer to report.pdf
- For project presentation slides, see slides.pdf
-
Build docker workspace for CFU playground
./run start
-
Navigate to the project directory:
cd proj/project -
Navigate to the chisel folder
cd ./chisel -
Generate Verilog code according to Chisel design into
build/SIMDEngine.vsbt 'runMain simd.SIMDEngineApp' -
Back to project directory
cd ../ -
Build the project:
make renode-headless
-
Press
space. -
Press
3. -
Press
hto test the whole AlexNet. -
Press
escandCtrl-Dto exit.
-
The code for AlexNet is located at:
/proj/project/src/acal_lab/libs/models/AlexNet/AlexNet.ccModify the type of the function of the model component (e.g., conv/gemm) to use SIMD or scalar. This allows you to observe the speed difference between SIMD and serial execution.
-
The SIMD implementation of the model components can be found at:
/proj/project/src/acal_lab/libs/op/simdThis directory contains the SIMD instructions we created.
-
The hardware design of the SIMD instructions is implemented in:
/proj/project/chisel/src/main/scala/simd