Skip to content

WIP: Test enabling Parquet filter pushdown with parquet caching page cache reader #15506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 31, 2025

This is still a draft as the branch appears to hang on certain queries:

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ cargo test --test sqllogictests -- parquet_filter_pushdown
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.24s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-ae4ca2e4c85de797)
[00:00:00] ##########################--------------      11/17      "parquet_filter_pushdown.slt"
.. hangs indefinitely ..

Which issue does this PR close?

Rationale for this change

This PR is designed to verify the changes from @XiangpengHao 's pushdown encoder"

What changes are included in this PR?

  1. Pin to Experimental parquet decoder with first-class selection pushdown support arrow-rs#6921
  2. Enable filter pushdown by default

Are these changes tested?

Are there any user-facing changes?

Benchmarks

Not too shabby!

I need to look at some of these queries that report being slower to see if there is somethig we cna do to make the speed back up


--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.57ms │                0.58ms │     no change │
│ QQuery 1     │    68.16ms │               70.64ms │     no change │
│ QQuery 2     │   116.52ms │              118.28ms │     no change │
│ QQuery 3     │   123.86ms │              123.76ms │     no change │
│ QQuery 4     │   776.15ms │              797.72ms │     no change │
│ QQuery 5     │   848.82ms │              880.01ms │     no change │
│ QQuery 6     │    64.80ms │               67.14ms │     no change │
│ QQuery 7     │    77.36ms │               93.04ms │  1.20x slower │
│ QQuery 8     │   957.21ms │              983.79ms │     no change │
│ QQuery 9     │  1239.49ms │             1275.73ms │     no change │
│ QQuery 10    │   299.68ms │              318.01ms │  1.06x slower │
│ QQuery 11    │   344.19ms │              360.58ms │     no change │
│ QQuery 12    │   945.45ms │             1058.68ms │  1.12x slower │
│ QQuery 13    │  1323.58ms │             1548.38ms │  1.17x slower │
│ QQuery 14    │   885.86ms │             1063.57ms │  1.20x slower │
│ QQuery 15    │  1110.56ms │             1134.68ms │     no change │
│ QQuery 16    │  1834.24ms │             1789.95ms │     no change │
│ QQuery 17    │  1662.90ms │             1650.05ms │     no change │
│ QQuery 18    │  3176.45ms │             3164.17ms │     no change │
│ QQuery 19    │   116.43ms │              123.84ms │  1.06x slower │
│ QQuery 20    │  1206.01ms │             1204.47ms │     no change │
│ QQuery 21    │  1445.16ms │             1351.41ms │ +1.07x faster │
│ QQuery 22    │  2708.56ms │             2401.18ms │ +1.13x faster │
│ QQuery 23    │  8690.72ms │             5234.73ms │ +1.66x faster │
│ QQuery 24    │   509.28ms │              684.55ms │  1.34x slower │
│ QQuery 25    │   426.36ms │              553.26ms │  1.30x slower │
│ QQuery 26    │   581.56ms │              802.46ms │  1.38x slower │
│ QQuery 27    │  1797.38ms │             2464.11ms │  1.37x slower │
│ QQuery 28    │ 13274.54ms │            14650.91ms │  1.10x slower │
│ QQuery 29    │   629.26ms │              598.10ms │     no change │
│ QQuery 30    │   970.56ms │             1286.68ms │  1.33x slower │
│ QQuery 31    │  1008.49ms │             1398.40ms │  1.39x slower │
│ QQuery 32    │  3220.54ms │             3249.25ms │     no change │
│ QQuery 33    │  3948.22ms │             3595.57ms │ +1.10x faster │
│ QQuery 34    │  3968.56ms │             3536.51ms │ +1.12x faster │
│ QQuery 35    │  1477.42ms │             1317.28ms │ +1.12x faster │
│ QQuery 36    │   310.58ms │              277.12ms │ +1.12x faster │
│ QQuery 37    │   144.55ms │              136.09ms │ +1.06x faster │
│ QQuery 38    │   188.40ms │              167.13ms │ +1.13x faster │
│ QQuery 39    │   537.25ms │              419.03ms │ +1.28x faster │
│ QQuery 40    │    73.01ms │              110.69ms │  1.52x slower │
│ QQuery 41    │    83.59ms │              105.50ms │  1.26x slower │
│ QQuery 42    │    91.66ms │               96.24ms │     no change │
└──────────────┴────────────┴───────────────────────┴───────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)               │ 63263.96ms │
│ Total Time (alamb_filter_pushdown)   │ 62263.28ms │
│ Average Time (main_base)             │  1471.25ms │
│ Average Time (alamb_filter_pushdown) │  1447.98ms │
│ Queries Faster                       │         10 │
│ Queries Slower                       │         15 │
│ Queries with No Change               │         18 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2255.92ms │             1949.07ms │ +1.16x faster │
│ QQuery 1     │   750.84ms │              716.29ms │     no change │
│ QQuery 2     │  1618.66ms │             1410.83ms │ +1.15x faster │
│ QQuery 3     │   739.24ms │              703.75ms │     no change │
│ QQuery 4     │  1659.53ms │             1715.83ms │     no change │
│ QQuery 5     │ 18684.93ms │            17202.28ms │ +1.09x faster │
└──────────────┴────────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)               │ 25709.12ms │
│ Total Time (alamb_filter_pushdown)   │ 23698.05ms │
│ Average Time (main_base)             │  4284.85ms │
│ Average Time (alamb_filter_pushdown) │  3949.67ms │
│ Queries Faster                       │          3 │
│ Queries Slower                       │          0 │
│ Queries with No Change               │          3 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_filter_pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     3.56ms │                2.75ms │ +1.30x faster │
│ QQuery 1     │    40.80ms │               44.22ms │  1.08x slower │
│ QQuery 2     │   101.20ms │               95.75ms │ +1.06x faster │
│ QQuery 3     │   104.30ms │              100.38ms │     no change │
│ QQuery 4     │   837.17ms │              760.17ms │ +1.10x faster │
│ QQuery 5     │   980.95ms │              872.30ms │ +1.12x faster │
│ QQuery 6     │    40.34ms │               37.64ms │ +1.07x faster │
│ QQuery 7     │    44.91ms │               63.34ms │  1.41x slower │
│ QQuery 8     │  1069.04ms │              953.13ms │ +1.12x faster │
│ QQuery 9     │  1450.28ms │             1225.24ms │ +1.18x faster │
│ QQuery 10    │   310.34ms │              301.07ms │     no change │
│ QQuery 11    │   349.23ms │              351.62ms │     no change │
│ QQuery 12    │  1095.13ms │             1081.91ms │     no change │
│ QQuery 13    │  1635.08ms │             1496.78ms │ +1.09x faster │
│ QQuery 14    │   988.22ms │             1101.26ms │  1.11x slower │
│ QQuery 15    │  1185.88ms │             1116.01ms │ +1.06x faster │
│ QQuery 16    │  2003.29ms │             1804.50ms │ +1.11x faster │
│ QQuery 17    │  1822.90ms │             1638.95ms │ +1.11x faster │
│ QQuery 18    │  3521.78ms │             3125.60ms │ +1.13x faster │
│ QQuery 19    │    93.58ms │              100.39ms │  1.07x slower │
│ QQuery 20    │  1260.89ms │             1150.78ms │ +1.10x faster │
│ QQuery 21    │  1528.21ms │             1303.30ms │ +1.17x faster │
│ QQuery 22    │  2742.69ms │             2316.73ms │ +1.18x faster │
│ QQuery 23    │  9454.13ms │             4883.92ms │ +1.94x faster │
│ QQuery 24    │   520.27ms │              703.58ms │  1.35x slower │
│ QQuery 25    │   435.35ms │              490.67ms │  1.13x slower │
│ QQuery 26    │   606.38ms │              765.75ms │  1.26x slower │
│ QQuery 27    │  1837.71ms │             2154.70ms │  1.17x slower │
│ QQuery 28    │ 13463.85ms │            13430.95ms │     no change │
│ QQuery 29    │   558.16ms │              536.66ms │     no change │
│ QQuery 30    │   934.19ms │             1342.03ms │  1.44x slower │
│ QQuery 31    │   985.71ms │             1388.86ms │  1.41x slower │
│ QQuery 32    │  3225.41ms │             2742.11ms │ +1.18x faster │
│ QQuery 33    │  3887.36ms │             3454.26ms │ +1.13x faster │
│ QQuery 34    │  3852.77ms │             3411.24ms │ +1.13x faster │
│ QQuery 35    │  1530.92ms │             1322.73ms │ +1.16x faster │
│ QQuery 36    │   265.16ms │              237.61ms │ +1.12x faster │
│ QQuery 37    │   105.59ms │              100.96ms │     no change │
│ QQuery 38    │   142.28ms │              142.17ms │     no change │
│ QQuery 39    │   522.51ms │              425.74ms │ +1.23x faster │
│ QQuery 40    │    60.05ms │               89.14ms │  1.48x slower │
│ QQuery 41    │    50.38ms │               78.62ms │  1.56x slower │
│ QQuery 42    │    62.24ms │               70.20ms │  1.13x slower │
└──────────────┴────────────┴───────────────────────┴───────────────┘


--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_filter_pushdown ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  125.02ms │              125.37ms │    no change │
│ QQuery 2     │   24.36ms │               24.40ms │    no change │
│ QQuery 3     │   36.76ms │               35.86ms │    no change │
│ QQuery 4     │   20.87ms │               20.76ms │    no change │
│ QQuery 5     │   57.47ms │               56.27ms │    no change │
│ QQuery 6     │    8.11ms │                8.39ms │    no change │
│ QQuery 7     │  102.83ms │              103.92ms │    no change │
│ QQuery 8     │   26.51ms │               26.58ms │    no change │
│ QQuery 9     │   62.19ms │               63.73ms │    no change │
│ QQuery 10    │   60.77ms │               60.92ms │    no change │
│ QQuery 11    │   13.11ms │               13.00ms │    no change │
│ QQuery 12    │   37.41ms │               38.59ms │    no change │
│ QQuery 13    │   30.73ms │               29.83ms │    no change │
│ QQuery 14    │    9.86ms │               10.15ms │    no change │
│ QQuery 15    │   25.03ms │               26.15ms │    no change │
│ QQuery 16    │   25.89ms │               24.94ms │    no change │
│ QQuery 17    │   95.52ms │               98.14ms │    no change │
│ QQuery 18    │  253.23ms │              252.32ms │    no change │
│ QQuery 19    │   28.84ms │               30.05ms │    no change │
│ QQuery 20    │   40.94ms │               41.14ms │    no change │
│ QQuery 21    │  172.30ms │              181.19ms │ 1.05x slower │
│ QQuery 22    │   18.05ms │               17.22ms │    no change │
└──────────────┴───────────┴───────────────────────┴──────────────┘


@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Mar 31, 2025
@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2025

I wrote up a performance analysis here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant