Skip to content

NestedLoopJoin is significantly slower than vanilla spark #12294

Open
@lifulong

Description

@lifulong

Bug description

same data and same stage (nestdloopjoin operator), vanilla spark cost 3min per task, while gluten (velox) cost 1h+ per task

nestloopjoin probe side size has 150 billion records, build side has 92 records

Image

the above graph is flame graph of velox, through the frame graph we can get it slow because gen dictionary vector for high base probe vector

System information

Velox System Info v0.0.2
Commit: 976a5b72a3a068cd1c70cc92ab64cfedae3649a1
CMake Version: 3.28.3
System: Linux-6.10.14-linuxkit
Arch: x86_64
C++ Compiler: /opt/rh/devtoolset-11/root/usr/bin/c++
C++ Compiler Version: 11.2.1
C Compiler: /opt/rh/devtoolset-11/root/usr/bin/gcc
C Compiler Version: 11.2.1
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib64/python3.6/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNewly created issue that needs attention.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions