Description
Bug description
same data and same stage (nestdloopjoin operator), vanilla spark cost 3min per task, while gluten (velox) cost 1h+ per task
nestloopjoin probe side size has 150 billion records, build side has 92 records
the above graph is flame graph of velox, through the frame graph we can get it slow because gen dictionary vector for high base probe vector
System information
Velox System Info v0.0.2
Commit: 976a5b72a3a068cd1c70cc92ab64cfedae3649a1
CMake Version: 3.28.3
System: Linux-6.10.14-linuxkit
Arch: x86_64
C++ Compiler: /opt/rh/devtoolset-11/root/usr/bin/c++
C++ Compiler Version: 11.2.1
C Compiler: /opt/rh/devtoolset-11/root/usr/bin/gcc
C Compiler Version: 11.2.1
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib64/python3.6/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt