Skip to content

Comments

[KYUUBI #7129] Support PARQUET hive table pushdown filter#7130

Closed
flaming-archer wants to merge 1 commit intoapache:masterfrom
flaming-archer:master_parquet_filterdown
Closed

[KYUUBI #7129] Support PARQUET hive table pushdown filter#7130
flaming-archer wants to merge 1 commit intoapache:masterfrom
flaming-archer:master_parquet_filterdown

Conversation

@flaming-archer
Copy link
Contributor

Why are the changes needed?

Previously, the HiveScan class was used to read data. If it is determined to be PARQUET type, the ParquetScan from Spark datasourcev2 can be used. ParquetScan supports pushfilter down, but HiveScan does not yet support it.

The conversation can be controlled by setting spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.

close #7129

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@flaming-archer flaming-archer changed the title Support PARQUET hive table pushdown filter [KYUUBI #7129] Support PARQUET hive table pushdown filter Jul 9, 2025
@pan3793 pan3793 requested a review from cfmcgrady July 9, 2025 07:28
@flaming-archer
Copy link
Contributor Author

like #7123

@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (60371b5) to head (d7059dc).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
...spark/connector/hive/KyuubiHiveConnectorConf.scala 0.00% 5 Missing ⚠️
...apache/kyuubi/spark/connector/hive/HiveTable.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7130   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         700     700           
  Lines       43435   43443    +8     
  Branches     5879    5881    +2     
======================================
- Misses      43435   43443    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@flaming-archer
Copy link
Contributor Author

@cfmcgrady please take a look at it.

@flaming-archer
Copy link
Contributor Author

In our testing, the performance can be improved by 1.4 times

@pan3793 pan3793 added this to the v1.11.0 milestone Jul 17, 2025
@pan3793 pan3793 closed this in 47063d9 Jul 17, 2025
@pan3793
Copy link
Member

pan3793 commented Jul 17, 2025

Thanks, merged to master

yangyuxia pushed a commit to yangyuxia/kyuubi that referenced this pull request Sep 22, 2025
### Why are the changes needed?

Previously, the `HiveScan` class was used to read data. If it is determined to be PARQUET type, the `ParquetScan` from Spark datasourcev2 can be used. `ParquetScan` supports pushfilter down, but `HiveScan` does not yet support it.

The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet`. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.

close apache#7129

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#7130 from flaming-archer/master_parquet_filterdown.

Closes apache#7129

d7059dc [tian bao] Support PARQUET hive table pushdown filter

Authored-by: tian bao <2011xuesong@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
yangyuxia pushed a commit to yangyuxia/kyuubi that referenced this pull request Nov 12, 2025
### Why are the changes needed?

Previously, the `HiveScan` class was used to read data. If it is determined to be PARQUET type, the `ParquetScan` from Spark datasourcev2 can be used. `ParquetScan` supports pushfilter down, but `HiveScan` does not yet support it.

The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet`. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.

close apache#7129

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#7130 from flaming-archer/master_parquet_filterdown.

Closes apache#7129

d7059dc [tian bao] Support PARQUET hive table pushdown filter

Authored-by: tian bao <2011xuesong@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
yangyuxia pushed a commit to yangyuxia/kyuubi that referenced this pull request Nov 13, 2025
### Why are the changes needed?

Previously, the `HiveScan` class was used to read data. If it is determined to be PARQUET type, the `ParquetScan` from Spark datasourcev2 can be used. `ParquetScan` supports pushfilter down, but `HiveScan` does not yet support it.

The conversation can be controlled by setting `spark.sql.kyuubi.hive.connector.read.convertMetastoreParquet`. When enabled, the data source PARQUET reader is used to process PARQUET tables created by using the HiveQL syntax, instead of Hive SerDe.

close apache#7129

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#7130 from flaming-archer/master_parquet_filterdown.

Closes apache#7129

d7059dc [tian bao] Support PARQUET hive table pushdown filter

Authored-by: tian bao <2011xuesong@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Spark Hive connector supports Parquet hive table pushdown filter

3 participants