[KYUUBI #7129] Support PARQUET hive table pushdown filter by flaming-archer · Pull Request #7130 · apache/kyuubi

flaming-archer · 2025-07-09T07:26:32Z

Why are the changes needed?

Previously, the HiveScan class was used to read data. If it is determined to be PARQUET type, the ParquetScan from Spark datasourcev2 can be used. ParquetScan supports pushfilter down, but HiveScan does not yet support it.

The conversation can be controlled by setting spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.

close #7129

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

flaming-archer · 2025-07-09T07:29:14Z

like #7123

codecov-commenter · 2025-07-09T08:34:14Z

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (60371b5) to head (d7059dc).
Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
...spark/connector/hive/KyuubiHiveConnectorConf.scala	0.00%	5 Missing ⚠️
...apache/kyuubi/spark/connector/hive/HiveTable.scala	0.00%	3 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #7130   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         700     700           
  Lines       43435   43443    +8     
  Branches     5879    5881    +2     
======================================
- Misses      43435   43443    +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

flaming-archer · 2025-07-10T01:58:08Z

@cfmcgrady please take a look at it.

flaming-archer · 2025-07-17T02:40:43Z

In our testing, the performance can be improved by 1.4 times

pan3793 · 2025-07-17T06:42:59Z

Thanks, merged to master

### Why are the changes needed? Previously, the `HiveScan` class was used to read data. If it is determined to be PARQUET type, the `ParquetScan` from Spark datasourcev2 can be used. `ParquetScan` supports pushfilter down, but `HiveScan` does not yet support it. The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet`. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe. close apache#7129 ### How was this patch tested? added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#7130 from flaming-archer/master_parquet_filterdown. Closes apache#7129 d7059dc [tian bao] Support PARQUET hive table pushdown filter Authored-by: tian bao <2011xuesong@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>

Support PARQUET hive table pushdown filter

d7059dc

github-actions bot added module:spark module:extensions labels Jul 9, 2025

flaming-archer changed the title ~~Support PARQUET hive table pushdown filter~~ [KYUUBI #7129] Support PARQUET hive table pushdown filter Jul 9, 2025

pan3793 requested a review from cfmcgrady July 9, 2025 07:28

pan3793 approved these changes Jul 17, 2025

View reviewed changes

pan3793 assigned flaming-archer Jul 17, 2025

pan3793 added this to the v1.11.0 milestone Jul 17, 2025

pan3793 closed this in 47063d9 Jul 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[KYUUBI #7129] Support PARQUET hive table pushdown filter#7130

[KYUUBI #7129] Support PARQUET hive table pushdown filter#7130
flaming-archer wants to merge 1 commit intoapache:masterfrom
flaming-archer:master_parquet_filterdown

flaming-archer commented Jul 9, 2025

Uh oh!

flaming-archer commented Jul 9, 2025

Uh oh!

codecov-commenter commented Jul 9, 2025 •

edited

Loading

Uh oh!

flaming-archer commented Jul 10, 2025

Uh oh!

flaming-archer commented Jul 17, 2025

Uh oh!

pan3793 commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

flaming-archer commented Jul 9, 2025

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

flaming-archer commented Jul 9, 2025

Uh oh!

codecov-commenter commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flaming-archer commented Jul 10, 2025

Uh oh!

flaming-archer commented Jul 17, 2025

Uh oh!

pan3793 commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jul 9, 2025 •

edited

Loading