Skip to content

chore: update to DataFusion 48.0.0 #3520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 5, 2025

Description

The description of the main changes of your pull request

Related Issue(s)

This tests upgrading delta-rs to DataFusion 48. It seems to have gone fine, but
I hit a few rough edges that I will list and file fixes for in DataFusion:

Documentation

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Jun 5, 2025
Copy link

github-actions bot commented Jun 5, 2025

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

};
use datafusion::datasource::{listing::PartitionedFile, MemTable, TableProvider, TableType};
use datafusion::execution::context::{SessionConfig, SessionContext, SessionState, TaskContext};
use datafusion::execution::runtime_env::RuntimeEnv;
use datafusion::execution::FunctionRegistry;
use datafusion::optimizer::simplify_expressions::ExprSimplifier;
use datafusion::physical_optimizer::pruning::{PruningPredicate, PruningStatistics};
use datafusion::physical_optimizer::pruning::PruningPredicate;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PruningStatistics is no longer accessible via datafuson::physical_optimizer as it was moved. We should fix this with a pub use I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.with_limit(self.limit)
.with_table_partition_cols(table_partition_cols)
.build();
let file_source =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all reformatting as now with_schema_adapter_factory returns an Arc directly

@@ -1114,8 +1113,11 @@ impl ExecutionPlan for DeltaScan {
Some(self.metrics.clone_inner())
}

fn statistics(&self) -> DataFusionResult<Statistics> {
self.parquet_scan.statistics()
fn partition_statistics(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statistics was deprecated

@alamb alamb changed the title chore: Update to DataFusion 48.0.0 chore: update to DataFusion 48.0.0 Jun 5, 2025
@alamb alamb force-pushed the alamb/update_datafusion_48 branch from f79704e to ca487a2 Compare June 5, 2025 15:39
@alamb alamb marked this pull request as ready for review June 5, 2025 16:36
@alamb
Copy link
Contributor Author

alamb commented Jun 5, 2025

@ion-elgreco or @rtyler is there any chance you can kick off the CI tests for this PR (so I can verify that DataFusion 48.0.0 passes the delta.rs tests?)

Copy link

codecov bot commented Jun 5, 2025

Codecov Report

Attention: Patch coverage is 71.91011% with 25 lines in your changes missing coverage. Please review.

Project coverage is 74.21%. Comparing base (9453b2c) to head (ad59945).

Files with missing lines Patch % Lines
crates/sql/src/parser.rs 47.82% 8 Missing and 4 partials ⚠️
crates/core/src/delta_datafusion/schema_adapter.rs 0.00% 6 Missing ⚠️
crates/core/src/table/state_arrow.rs 80.00% 0 Missing and 3 partials ⚠️
crates/core/src/delta_datafusion/mod.rs 95.23% 0 Missing and 1 partial ⚠️
crates/core/src/kernel/snapshot/mod.rs 0.00% 0 Missing and 1 partial ⚠️
crates/core/src/operations/load_cdf.rs 0.00% 1 Missing ⚠️
...tes/core/src/operations/write/generated_columns.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3520      +/-   ##
==========================================
- Coverage   74.24%   74.21%   -0.04%     
==========================================
  Files         150      150              
  Lines       44733    44758      +25     
  Branches    44733    44758      +25     
==========================================
+ Hits        33210    33215       +5     
- Misses       9376     9389      +13     
- Partials     2147     2154       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@alamb
Copy link
Contributor Author

alamb commented Jun 5, 2025

CI failure is due to my unpinning arrow-rs
https://github.com/delta-io/delta-rs/actions/runs/15471319016/job/43562193454?pr=3520

I will try and unpin it in datafusion as well

@alamb alamb force-pushed the alamb/update_datafusion_48 branch from 390478d to 783acbc Compare June 5, 2025 17:59
@alamb
Copy link
Contributor Author

alamb commented Jun 5, 2025

Ok, I think this is ready for another round of tests

@alamb alamb force-pushed the alamb/update_datafusion_48 branch from 783acbc to fe51187 Compare June 6, 2025 16:58
@@ -311,7 +312,7 @@ fn parse_partitions(batch: RecordBatch, partition_schema: &StructType) -> DeltaR

insert_field(
batch,
StructArray::try_new(
StructArray::try_new_with_length(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated a few calls to use StructArray::try_new_with_length( as without them several of the unit tests were failing

/// Use the native parser to parse the statement
// TODO fix for multiple statememnts and keeping parsers in sync
fn parse_with_native(&mut self) -> Result<Statement, ParserError> {
// DataFusion's parser returns DataFusionError as it may also add
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate to have to translate back from DataFusion error to a ParserError, but I think it is unavoidable as now DataFusion will potentially augment ParserErrors with Span and context information

To retain the same API we need to strip off any additional information and return the underlying ParserError

@alamb
Copy link
Contributor Author

alamb commented Jun 6, 2025

Ok, I resolved some other issues and updated to DataFusion 48 rc2 -- things are looking good for me locally. Will wait for the CI results

@alamb
Copy link
Contributor Author

alamb commented Jun 6, 2025

The remaining failures don't look to me like they are related to this PR so I am claiming success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant