Skip to content

perf: short-circuit and collect_bool for IN list with column references#20694

Open
zhangxffff wants to merge 3 commits intoapache:mainfrom
zhangxffff:feat/in-list-short-circuit
Open

perf: short-circuit and collect_bool for IN list with column references#20694
zhangxffff wants to merge 3 commits intoapache:mainfrom
zhangxffff:feat/in-list-short-circuit

Conversation

@zhangxffff
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Third PR in the IN list optimization series (split from #20428):

What changes are included in this PR?

  • Short-circuit break: convert try_fold to for loop; when all non-null rows are already true, skip remaining list items (up to 27x faster for match=100%/nulls=0%)
  • BooleanBuffer::collect_bool: use in make_comparator fallback path for nested types instead (0..n).map().collect() (suggested by @Dandandan in perf: Optimize IN list with column references evaluation #20428 )
  • First-expr initialization: evaluate the first list expression directly as the accumulator, avoiding a redundant or_kleene(all_false, rhs) (suggested by @Dandandan in perf: Optimize IN list with column references evaluation #20428 )
  • Tests: added 3 new tests covering short-circuit, short-circuit with nulls, and struct column references (make_comparator fallback path)

Are these changes tested?

Yes, and add test to cover short-circuit, short-circuit with nulls, and struct column references (make_comparator fallback path)

Benchmark result:

(zhangxffff) zhangxffff@95d3d60664da ~/W/datafusion ((bcc52cd4))> critcmp after before
group                                              after                                  before
-----                                              -----                                  ------
in_list_cols/Int32/list=28/match=0%/nulls=0%       1.02     93.8±1.80µs        ? ?/sec    1.00     91.8±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%      1.03    105.3±1.95µs        ? ?/sec    1.00    102.2±1.59µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00      3.4±0.07µs        ? ?/sec    27.14    91.7±1.52µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%    1.07    107.7±1.91µs        ? ?/sec    1.00    100.4±1.33µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%      1.00     50.1±1.15µs        ? ?/sec    1.84     92.4±1.36µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%     1.05    105.1±1.49µs        ? ?/sec    1.00    100.0±0.84µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00      9.9±0.17µs        ? ?/sec    1.01     10.1±0.19µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%       1.02     11.0±0.18µs        ? ?/sec    1.00     10.8±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00      3.3±0.06µs        ? ?/sec    2.95      9.9±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%     1.01     10.9±0.19µs        ? ?/sec    1.00     10.8±0.09µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00     10.0±0.17µs        ? ?/sec    1.00      9.9±0.18µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%      1.05     11.3±0.24µs        ? ?/sec    1.00     10.8±0.11µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%        1.02     26.7±0.58µs        ? ?/sec    1.00     26.2±0.50µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%       1.04     29.6±0.57µs        ? ?/sec    1.00     28.5±0.45µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%      1.00      3.4±0.05µs        ? ?/sec    7.78     26.2±0.36µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%     1.05     30.0±0.65µs        ? ?/sec    1.00     28.7±0.55µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%       1.03     26.7±0.59µs        ? ?/sec    1.00     26.0±0.37µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%      1.04     29.9±0.57µs        ? ?/sec    1.00     28.7±0.46µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                 1.17    155.0±2.44µs        ? ?/sec    1.00    132.8±2.97µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%               1.02   726.6±14.54µs        ? ?/sec    1.00    712.4±9.09µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                1.02  1070.1±13.06µs        ? ?/sec    1.00   1051.8±8.17µs        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                  1.14     16.4±0.37µs        ? ?/sec    1.00     14.4±0.22µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                1.02     68.0±1.29µs        ? ?/sec    1.00     66.5±0.99µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                 1.15    107.6±2.05µs        ? ?/sec    1.00     93.6±1.88µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                  1.16     44.0±0.61µs        ? ?/sec    1.00     37.9±0.95µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                1.00    190.4±2.71µs        ? ?/sec    1.03    195.7±2.01µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                 1.03    295.9±4.45µs        ? ?/sec    1.00    287.3±3.26µs        ? ?/sec

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 4, 2026
@zhangxffff
Copy link
Contributor Author

run benchmark in_list

@alamb-ghbot
Copy link

🤖 Hi @zhangxffff, thanks for the request (#20694 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Jefffrey, Omega359, adriangb, alamb, comphead, etseidl, gabotechs, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

@zhangxffff
Copy link
Contributor Author

@Dandandan @adriangb This PR adds a short-circuit optimization that breaks early when all rows already match, and incorporates the suggestions from #20428 (BooleanBuffer::collect_bool and first-expr initialization). Would appreciate your review when you have a chance.

@adriangb
Copy link
Contributor

adriangb commented Mar 4, 2026

run benchmark in_list

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/in-list-short-circuit (010bbc6) to bcc52cd diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --features=parquet --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_in-list-short-circuit
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                  feat_in-list-short-circuit             main
-----                                                  --------------------------             ----
in_list/Float32/list=100/nulls=0%                      1.00     51.5±0.47µs        ? ?/sec    1.35     69.6±0.39µs        ? ?/sec
in_list/Float32/list=100/nulls=20%                     1.00     36.8±0.15µs        ? ?/sec    1.30     47.7±0.61µs        ? ?/sec
in_list/Float32/list=28/nulls=0%                       1.09     60.7±4.71µs        ? ?/sec    1.00     55.5±0.19µs        ? ?/sec
in_list/Float32/list=28/nulls=20%                      1.00     59.8±0.27µs        ? ?/sec    1.08     64.3±0.28µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                        1.00     29.8±0.50µs        ? ?/sec    1.00     29.8±0.16µs        ? ?/sec
in_list/Float32/list=3/nulls=20%                       1.00     32.0±0.58µs        ? ?/sec    1.00     32.1±0.10µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                        1.00     32.4±0.44µs        ? ?/sec    1.01     32.7±0.56µs        ? ?/sec
in_list/Float32/list=8/nulls=20%                       1.00     32.1±0.24µs        ? ?/sec    1.09     35.0±0.16µs        ? ?/sec
in_list/Int16/list=100/nulls=0%                        1.02     45.7±1.07µs        ? ?/sec    1.00     44.7±0.32µs        ? ?/sec
in_list/Int16/list=100/nulls=20%                       1.00     39.7±0.59µs        ? ?/sec    1.08     42.9±0.69µs        ? ?/sec
in_list/Int16/list=28/nulls=0%                         1.00     46.0±0.13µs        ? ?/sec    1.15     53.0±4.32µs        ? ?/sec
in_list/Int16/list=28/nulls=20%                        1.00     38.3±0.10µs        ? ?/sec    1.48     56.5±0.26µs        ? ?/sec
in_list/Int16/list=3/nulls=0%                          1.00     29.2±0.29µs        ? ?/sec    1.01     29.4±0.52µs        ? ?/sec
in_list/Int16/list=3/nulls=20%                         1.00     28.7±0.09µs        ? ?/sec    1.00     28.8±0.15µs        ? ?/sec
in_list/Int16/list=8/nulls=0%                          1.01     32.1±0.07µs        ? ?/sec    1.00     31.6±0.26µs        ? ?/sec
in_list/Int16/list=8/nulls=20%                         1.01     31.7±0.12µs        ? ?/sec    1.00     31.3±0.15µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                        2.57     80.6±0.34µs        ? ?/sec    1.00     31.3±0.75µs        ? ?/sec
in_list/Int32/list=100/nulls=20%                       1.71     59.7±0.31µs        ? ?/sec    1.00     35.0±4.28µs        ? ?/sec
in_list/Int32/list=28/nulls=0%                         1.78     58.6±0.47µs        ? ?/sec    1.00     32.8±0.48µs        ? ?/sec
in_list/Int32/list=28/nulls=20%                        1.89     51.2±0.24µs        ? ?/sec    1.00     27.2±0.15µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                          1.26     29.1±1.34µs        ? ?/sec    1.00     23.1±0.25µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                         1.23     28.2±0.11µs        ? ?/sec    1.00     22.9±0.18µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                          1.22     31.5±0.11µs        ? ?/sec    1.00     25.7±0.14µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                         1.23     30.8±1.91µs        ? ?/sec    1.00     25.0±0.07µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=0%                  1.00     82.1±1.06µs        ? ?/sec    1.15     94.5±0.71µs        ? ?/sec
in_list/TimestampNs/list=100/nulls=20%                 1.00   118.2±13.00µs        ? ?/sec    1.05    123.9±0.42µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=0%                   1.05     71.5±0.60µs        ? ?/sec    1.00     67.9±0.84µs        ? ?/sec
in_list/TimestampNs/list=28/nulls=20%                  1.00    100.0±1.79µs        ? ?/sec    1.25    125.1±4.09µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=0%                    1.00     51.3±0.12µs        ? ?/sec    1.01     51.7±0.99µs        ? ?/sec
in_list/TimestampNs/list=3/nulls=20%                   1.00     91.2±3.07µs        ? ?/sec    1.11    100.8±0.41µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=0%                    1.02     56.3±0.22µs        ? ?/sec    1.00     55.0±0.15µs        ? ?/sec
in_list/TimestampNs/list=8/nulls=20%                   1.00     95.0±0.35µs        ? ?/sec    1.08    102.9±2.20µs        ? ?/sec
in_list/UInt8/list=100/nulls=0%                        1.07     46.5±0.30µs        ? ?/sec    1.00     43.5±0.34µs        ? ?/sec
in_list/UInt8/list=100/nulls=20%                       1.01     54.9±2.43µs        ? ?/sec    1.00     54.2±0.49µs        ? ?/sec
in_list/UInt8/list=28/nulls=0%                         1.59     62.3±0.42µs        ? ?/sec    1.00     39.3±0.25µs        ? ?/sec
in_list/UInt8/list=28/nulls=20%                        1.24     34.4±0.30µs        ? ?/sec    1.00     27.7±2.19µs        ? ?/sec
in_list/UInt8/list=3/nulls=0%                          1.02     22.2±0.16µs        ? ?/sec    1.00     21.8±0.07µs        ? ?/sec
in_list/UInt8/list=3/nulls=20%                         1.05     22.7±3.28µs        ? ?/sec    1.00     21.7±0.48µs        ? ?/sec
in_list/UInt8/list=8/nulls=0%                          1.00     24.6±0.07µs        ? ?/sec    1.07     26.5±0.17µs        ? ?/sec
in_list/UInt8/list=8/nulls=20%                         1.00     23.2±0.06µs        ? ?/sec    1.59     37.0±0.19µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100                 1.00    148.5±2.14µs        ? ?/sec    1.22    181.5±2.41µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12                  1.00     96.9±1.76µs        ? ?/sec    1.11    107.5±0.48µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3                   1.00     98.1±0.83µs        ? ?/sec    1.22    119.3±1.71µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100                1.03    191.5±2.49µs        ? ?/sec    1.00    186.0±1.61µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12                 1.00    130.8±1.59µs        ? ?/sec    1.05    137.4±2.30µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3                  1.12    162.8±0.92µs        ? ?/sec    1.00    145.0±1.43µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=100                  1.00    138.3±3.38µs        ? ?/sec    1.30   179.8±10.37µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=12                   1.00     79.1±0.36µs        ? ?/sec    1.58    125.1±1.50µs        ? ?/sec
in_list/Utf8/list=28/nulls=0%/str=3                    1.06    121.5±1.40µs        ? ?/sec    1.00    114.2±0.43µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=100                 1.00    180.6±2.53µs        ? ?/sec    1.13    204.2±1.30µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=12                  1.00    122.6±3.19µs        ? ?/sec    1.30    159.7±2.34µs        ? ?/sec
in_list/Utf8/list=28/nulls=20%/str=3                   1.18    155.2±0.66µs        ? ?/sec    1.00    131.3±0.65µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100                   1.00    126.4±1.05µs        ? ?/sec    1.00    126.6±2.51µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12                    1.00     68.0±0.18µs        ? ?/sec    1.03     70.4±0.52µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3                     1.00     69.4±0.39µs        ? ?/sec    1.08     74.7±1.13µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100                  1.00    158.8±5.73µs        ? ?/sec    1.03    163.0±2.50µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12                   1.00    112.4±0.96µs        ? ?/sec    1.06    119.2±0.86µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3                    1.00    114.0±0.36µs        ? ?/sec    1.08    123.3±0.52µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100                   1.00    131.2±0.71µs        ? ?/sec    1.03   135.1±14.67µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12                    1.00     73.0±1.38µs        ? ?/sec    1.04     75.6±0.87µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3                     1.00     74.7±1.74µs        ? ?/sec    1.07     79.9±0.37µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100                  1.00    162.0±4.11µs        ? ?/sec    1.03    166.1±2.28µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12                   1.00    116.7±0.84µs        ? ?/sec    1.05    122.7±0.32µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3                    1.00    118.4±0.30µs        ? ?/sec    1.08    127.2±0.45µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=0%          1.00    127.5±2.56µs        ? ?/sec    1.26    161.1±0.63µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=0%/nulls=20%         1.00    179.2±1.86µs        ? ?/sec    1.05    188.6±1.13µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=0%         1.00    179.8±2.45µs        ? ?/sec    1.03    185.1±2.17µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=25%/nulls=20%        1.00    203.8±0.56µs        ? ?/sec    1.15   233.8±15.24µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=0%         1.00    181.0±5.05µs        ? ?/sec    1.05    189.5±0.63µs        ? ?/sec
in_list/Utf8/mixed/list=100/match=75%/nulls=20%        1.00    219.9±0.47µs        ? ?/sec    1.03    227.4±0.70µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=0%           1.13    139.9±2.71µs        ? ?/sec    1.00    123.3±0.40µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=0%/nulls=20%          1.09    179.8±1.40µs        ? ?/sec    1.00    165.7±0.49µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=0%          1.02    204.7±0.49µs        ? ?/sec    1.00    200.8±1.14µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=25%/nulls=20%         1.00    213.9±0.55µs        ? ?/sec    1.00   213.5±10.39µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=0%          1.00    176.7±2.60µs        ? ?/sec    1.05    186.3±4.42µs        ? ?/sec
in_list/Utf8/mixed/list=28/match=75%/nulls=20%         1.00    223.5±2.55µs        ? ?/sec    1.01    225.5±0.46µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=0%            1.00    102.1±0.39µs        ? ?/sec    1.04    106.0±0.36µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=0%/nulls=20%           1.00    147.8±0.32µs        ? ?/sec    1.05    155.3±1.17µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=0%           1.00    148.5±0.28µs        ? ?/sec    1.04    153.8±0.91µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=25%/nulls=20%          1.00    182.4±0.68µs        ? ?/sec    1.05    191.6±5.11µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=0%           1.00    159.4±0.53µs        ? ?/sec    1.06    168.8±1.04µs        ? ?/sec
in_list/Utf8/mixed/list=3/match=75%/nulls=20%          1.00    212.7±4.04µs        ? ?/sec    1.03    219.7±0.56µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=0%            1.00    107.7±0.78µs        ? ?/sec    1.04    111.7±0.80µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=0%/nulls=20%           1.00    152.1±0.79µs        ? ?/sec    1.05    159.8±2.79µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=0%           1.00    161.3±2.23µs        ? ?/sec    1.04    168.0±0.63µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=25%/nulls=20%          1.00    194.6±1.81µs        ? ?/sec    1.06   205.7±10.30µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=0%           1.00    172.9±1.66µs        ? ?/sec    1.03    178.4±0.53µs        ? ?/sec
in_list/Utf8/mixed/list=8/match=75%/nulls=20%          1.00   204.6±18.55µs        ? ?/sec    1.03    210.0±1.41µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100             1.00    145.3±0.49µs        ? ?/sec    1.07    155.6±0.87µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12              1.00     69.6±0.27µs        ? ?/sec    1.07     74.5±0.27µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3               1.08    102.9±2.62µs        ? ?/sec    1.00     94.9±0.30µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100            1.05   208.7±23.68µs        ? ?/sec    1.00    198.4±0.63µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12             1.00    140.2±0.76µs        ? ?/sec    1.01    141.5±1.78µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3              1.00    113.1±0.45µs        ? ?/sec    1.06    120.2±2.08µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=100              1.17    172.2±1.41µs        ? ?/sec    1.00    146.9±0.74µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=12               1.33    104.0±2.28µs        ? ?/sec    1.00     78.3±0.30µs        ? ?/sec
in_list/Utf8View/list=28/nulls=0%/str=3                1.11    108.8±0.57µs        ? ?/sec    1.00     97.9±0.35µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=100             1.00    173.1±4.28µs        ? ?/sec    1.02    176.4±2.25µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=12              1.00    107.8±0.46µs        ? ?/sec    1.04    112.4±0.84µs        ? ?/sec
in_list/Utf8View/list=28/nulls=20%/str=3               1.00    107.6±0.49µs        ? ?/sec    1.38    148.5±1.89µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100               1.00    129.1±0.58µs        ? ?/sec    1.00    129.7±0.72µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12                1.00     54.3±0.35µs        ? ?/sec    1.01     55.1±0.27µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3                 1.00     54.0±0.25µs        ? ?/sec    1.01     54.6±0.23µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100              1.00    162.1±0.54µs        ? ?/sec    1.03    167.3±0.78µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12               1.00     98.3±0.33µs        ? ?/sec    1.05    103.0±1.05µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3                1.00     98.6±0.40µs        ? ?/sec    1.04    103.0±2.15µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100               1.00    132.1±0.49µs        ? ?/sec    1.02    134.7±0.59µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12                1.00     58.8±0.69µs        ? ?/sec    1.04     61.0±0.35µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3                 1.00     58.4±0.18µs        ? ?/sec    1.03     60.4±1.06µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100              1.00    167.0±2.94µs        ? ?/sec    1.02    170.9±0.73µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12               1.00    102.8±1.18µs        ? ?/sec    1.04    107.1±0.42µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3                1.00    102.4±1.52µs        ? ?/sec    1.07    109.6±7.12µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=0%      1.12    144.0±1.39µs        ? ?/sec    1.00    128.9±0.86µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=0%/nulls=20%     1.00    161.7±0.73µs        ? ?/sec    1.02    165.1±0.82µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=0%     1.00    189.7±0.58µs        ? ?/sec    1.05    198.5±0.99µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=25%/nulls=20%    1.05    223.6±1.00µs        ? ?/sec    1.00    212.7±1.20µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=0%     1.00    196.1±1.42µs        ? ?/sec    1.12   219.8±13.08µs        ? ?/sec
in_list/Utf8View/mixed/list=100/match=75%/nulls=20%    1.00    215.6±0.37µs        ? ?/sec    1.17    251.2±1.85µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=0%       1.00    150.6±9.01µs        ? ?/sec    1.04    157.0±0.88µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=0%/nulls=20%      1.12    171.8±0.45µs        ? ?/sec    1.00    153.4±1.07µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=0%      1.00    166.2±1.18µs        ? ?/sec    1.31    218.0±0.93µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=25%/nulls=20%     1.00    214.9±0.75µs        ? ?/sec    1.13    241.9±9.72µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=0%      1.00    196.6±1.68µs        ? ?/sec    1.08    213.0±1.62µs        ? ?/sec
in_list/Utf8View/mixed/list=28/match=75%/nulls=20%     1.00    221.5±1.25µs        ? ?/sec    1.06    235.4±0.98µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=0%        1.00     94.9±1.50µs        ? ?/sec    1.06    100.2±0.70µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=0%/nulls=20%       1.00    137.4±0.34µs        ? ?/sec    1.01    139.1±0.83µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=0%       1.00    150.2±3.99µs        ? ?/sec    1.08    162.8±0.71µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=25%/nulls=20%      1.00    186.0±1.41µs        ? ?/sec    1.06    197.0±2.19µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=0%       1.00    174.6±0.52µs        ? ?/sec    1.08    188.0±1.20µs        ? ?/sec
in_list/Utf8View/mixed/list=3/match=75%/nulls=20%      1.00    217.3±0.88µs        ? ?/sec    1.08    235.4±1.48µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=0%        1.00    102.9±5.13µs        ? ?/sec    1.05    107.8±0.45µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=0%/nulls=20%       1.00    143.9±1.25µs        ? ?/sec    1.02    146.1±2.13µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=0%       1.00    161.8±1.31µs        ? ?/sec    1.08    174.5±0.43µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=25%/nulls=20%      1.00    185.3±1.26µs        ? ?/sec    1.07    199.1±4.84µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=0%       1.00    167.7±0.86µs        ? ?/sec    1.08    181.6±1.05µs        ? ?/sec
in_list/Utf8View/mixed/list=8/match=75%/nulls=20%      1.00    220.6±0.79µs        ? ?/sec    1.08    237.3±0.45µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=0%           1.04   169.1±10.19µs        ? ?/sec    1.00    162.3±0.40µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%          1.08    192.4±1.15µs        ? ?/sec    1.00    177.9±0.66µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%         1.00      5.9±0.04µs        ? ?/sec    28.20  166.0±13.84µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%        1.08    192.2±1.16µs        ? ?/sec    1.00    178.1±1.05µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%          1.00     88.2±0.45µs        ? ?/sec    1.84    162.5±0.51µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%         1.08    192.4±1.20µs        ? ?/sec    1.00    178.7±1.43µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%            1.00     17.6±0.22µs        ? ?/sec    1.00     17.5±0.45µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%           1.04     19.7±0.05µs        ? ?/sec    1.00     19.0±0.13µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%          1.00      5.9±0.07µs        ? ?/sec    3.05     17.9±1.81µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%         1.04     19.6±0.13µs        ? ?/sec    1.00     18.9±0.06µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%           1.01     17.6±0.13µs        ? ?/sec    1.00     17.4±0.04µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%          1.03     19.7±0.15µs        ? ?/sec    1.00     19.1±0.18µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%            1.02     47.0±0.24µs        ? ?/sec    1.00     46.1±0.10µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%           1.07     53.8±0.29µs        ? ?/sec    1.00     50.4±0.45µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%          1.00      5.9±0.05µs        ? ?/sec    7.91     46.4±0.64µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%         1.07     53.9±0.80µs        ? ?/sec    1.00     50.3±0.21µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%           1.02     47.1±0.30µs        ? ?/sec    1.00     46.2±0.27µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%          1.07     53.8±0.57µs        ? ?/sec    1.00     50.3±0.17µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                     1.00    281.9±0.94µs        ? ?/sec    1.39    390.6±1.05µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%                   1.00    615.8±3.61µs        ? ?/sec    1.20    737.7±2.64µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                    1.00  1234.1±20.04µs        ? ?/sec    1.18   1452.8±4.97µs        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                      1.00     29.3±0.16µs        ? ?/sec    1.43     41.7±0.11µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                    1.00     64.3±0.68µs        ? ?/sec    1.22     78.4±2.18µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                     1.00    122.3±0.47µs        ? ?/sec    1.20    147.0±1.88µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                      1.00     79.9±0.95µs        ? ?/sec    1.39    110.9±0.34µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                    1.00    174.9±0.73µs        ? ?/sec    1.20    210.5±1.79µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                     1.00    345.9±0.74µs        ? ?/sec    1.20    413.6±4.56µs        ? ?/sec

@adriangb
Copy link
Contributor

adriangb commented Mar 4, 2026

in_list_cols/Int32/list=28/match=100%/nulls=0% 1.00 5.9±0.04µs ? ?/sec 28.20 166.0±13.84µs ? ?/sec

🚀

Co-authored-by: Adrian Garcia Badaracco <[email protected]>
@neilconway
Copy link
Contributor

I'm curious if the null_count short-circuit helps in practice -- can you re-run the benchmarks when you get a chance?

@zhangxffff
Copy link
Contributor Author

I'm curious if the null_count short-circuit helps in practice -- can you re-run the benchmarks when you get a chance?

Benchmark result (before vs after vs after_null_count):
before: datafusion/main
after: original patch
after_null_count: patch with null_count guard

For nulls=20% cases: after version showed ~3-5% regressions due to calling true_count() on every iteration. after_null_count eliminates this, matching before (e.g. list=28/match=100%/nulls=20%: 100.9µs vs 104.5µs).

For the in_list_cols/Utf8 cases: the benchmark implicitly contains ~20% nulls, so the null_count() == 0 similarly eliminates the regressions (e.g. Utf8/list=3/match=50%: 92.3µs vs 105.2µs in after).

(zhangxffff) zhangxffff@95d3d60664da ~/W/datafusion ((bcc52cd4))> critcmp before after after_null_count
group                                              after                                  after_null_count                       before
-----                                              -----                                  ----------------                       ------
in_list_cols/Int32/list=28/match=0%/nulls=0%       1.01     92.8±0.72µs        ? ?/sec    1.02     93.3±1.48µs        ? ?/sec    1.00     91.7±2.11µs        ? ?/sec
in_list_cols/Int32/list=28/match=0%/nulls=20%      1.04    104.4±0.97µs        ? ?/sec    1.00    100.2±2.29µs        ? ?/sec    1.00    100.6±3.25µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=0%     1.00      3.3±0.03µs        ? ?/sec    1.00      3.3±0.03µs        ? ?/sec    27.42    91.6±1.31µs        ? ?/sec
in_list_cols/Int32/list=28/match=100%/nulls=20%    1.04    104.5±1.20µs        ? ?/sec    1.01    100.9±2.39µs        ? ?/sec    1.00    100.3±1.70µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=0%      1.02     50.4±0.94µs        ? ?/sec    1.00     49.7±0.51µs        ? ?/sec    1.84     91.4±1.78µs        ? ?/sec
in_list_cols/Int32/list=28/match=50%/nulls=20%     1.05    104.7±1.90µs        ? ?/sec    1.00     99.7±0.86µs        ? ?/sec    1.01    101.0±3.55µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=0%        1.00      9.9±0.10µs        ? ?/sec    1.00      9.9±0.13µs        ? ?/sec    1.00      9.9±0.08µs        ? ?/sec
in_list_cols/Int32/list=3/match=0%/nulls=20%       1.03     10.9±0.10µs        ? ?/sec    1.00     10.6±0.17µs        ? ?/sec    1.01     10.8±0.12µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=0%      1.00      3.3±0.03µs        ? ?/sec    1.00      3.3±0.08µs        ? ?/sec    2.97      9.9±0.25µs        ? ?/sec
in_list_cols/Int32/list=3/match=100%/nulls=20%     1.03     10.8±0.10µs        ? ?/sec    1.00     10.5±0.10µs        ? ?/sec    1.03     10.8±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=0%       1.00      9.9±0.09µs        ? ?/sec    1.02     10.1±0.24µs        ? ?/sec    1.00      9.9±0.16µs        ? ?/sec
in_list_cols/Int32/list=3/match=50%/nulls=20%      1.02     10.8±0.09µs        ? ?/sec    1.00     10.5±0.15µs        ? ?/sec    1.03     10.8±0.17µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=0%        1.01     26.5±0.19µs        ? ?/sec    1.02     26.8±0.48µs        ? ?/sec    1.00     26.1±0.26µs        ? ?/sec
in_list_cols/Int32/list=8/match=0%/nulls=20%       1.03     29.4±0.28µs        ? ?/sec    1.00     28.6±0.58µs        ? ?/sec    1.00     28.7±0.51µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=0%      1.01      3.3±0.05µs        ? ?/sec    1.00      3.3±0.03µs        ? ?/sec    7.92     26.3±0.74µs        ? ?/sec
in_list_cols/Int32/list=8/match=100%/nulls=20%     1.05     29.6±0.47µs        ? ?/sec    1.00     28.2±0.40µs        ? ?/sec    1.02     28.7±0.70µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=0%       1.01     26.6±0.28µs        ? ?/sec    1.01     26.6±0.53µs        ? ?/sec    1.00     26.3±0.39µs        ? ?/sec
in_list_cols/Int32/list=8/match=50%/nulls=20%      1.03     29.5±0.36µs        ? ?/sec    1.00     28.6±0.55µs        ? ?/sec    1.00     28.5±0.28µs        ? ?/sec
in_list_cols/Utf8/list=28/match=0%                 1.18    157.0±3.90µs        ? ?/sec    1.10    146.1±2.62µs        ? ?/sec    1.00    132.7±2.97µs        ? ?/sec
in_list_cols/Utf8/list=28/match=100%               1.09    722.5±9.38µs        ? ?/sec    1.00    665.8±9.49µs        ? ?/sec    1.08    722.0±6.94µs        ? ?/sec
in_list_cols/Utf8/list=28/match=50%                1.01  1068.6±16.04µs        ? ?/sec    1.01  1064.5±19.03µs        ? ?/sec    1.00  1053.9±14.52µs        ? ?/sec
in_list_cols/Utf8/list=3/match=0%                  1.14     16.2±0.38µs        ? ?/sec    1.07     15.3±0.28µs        ? ?/sec    1.00     14.2±0.24µs        ? ?/sec
in_list_cols/Utf8/list=3/match=100%                1.03     67.7±1.22µs        ? ?/sec    1.00     65.6±0.80µs        ? ?/sec    1.03     67.8±2.21µs        ? ?/sec
in_list_cols/Utf8/list=3/match=50%                 1.14    105.2±1.65µs        ? ?/sec    1.00     92.3±1.64µs        ? ?/sec    1.04     96.3±5.61µs        ? ?/sec
in_list_cols/Utf8/list=8/match=0%                  1.19     44.9±1.11µs        ? ?/sec    1.09     41.0±0.64µs        ? ?/sec    1.00     37.7±0.87µs        ? ?/sec
in_list_cols/Utf8/list=8/match=100%                1.01    194.3±2.14µs        ? ?/sec    1.00    191.7±2.73µs        ? ?/sec    1.02    195.9±2.36µs        ? ?/sec
in_list_cols/Utf8/list=8/match=50%                 1.02    294.0±2.76µs        ? ?/sec    1.00    287.3±2.73µs        ? ?/sec    1.02    294.0±3.57µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants