-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: arrow convenience extensions #827
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #827 +/- ##
==========================================
- Coverage 85.01% 84.99% -0.02%
==========================================
Files 84 86 +2
Lines 20656 20699 +43
Branches 20656 20699 +43
==========================================
+ Hits 17561 17594 +33
- Misses 2228 2229 +1
- Partials 867 876 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if all the new extension methods had actual use sites, to give a better sense of how useful they are? Right now only execute_arrow
has a real use site.
fn evaluate_arrow(&self, batch: RecordBatch) -> DeltaResult<RecordBatch>; | ||
} | ||
|
||
impl<T: ExpressionEvaluator + ?Sized> ExpressionEvaluatorExt for T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ?Sized
? Are there dyn impl somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we need that in order to invoke the associated function T::evaluate
?
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | ||
mask.map(|m| Ok(filter_record_batch(&record_batch, &m.into())?)) | ||
.unwrap_or(Ok(record_batch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good use for Option::map_or_else?
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
mask.map(|m| Ok(filter_record_batch(&record_batch, &m.into())?)) | |
.unwrap_or(Ok(record_batch)) | |
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
mask.map_or_else( | |
|| Ok(record_batch), | |
|m| Ok(filter_record_batch(&record_batch, &m.into())?), | |
} |
Tho simple imperative code probably wins on readability:
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
mask.map(|m| Ok(filter_record_batch(&record_batch, &m.into())?)) | |
.unwrap_or(Ok(record_batch)) | |
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
Ok(match mask { | |
Some(m) => filter_record_batch(&record_batch, &m.into())?, | |
None => record_batch, | |
}) |
or even
let record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
mask.map(|m| Ok(filter_record_batch(&record_batch, &m.into())?)) | |
.unwrap_or(Ok(record_batch)) | |
let mut record_batch = ArrowEngineData::try_from_engine_data(data)?.into(); | |
if let Some(m) = mask { | |
record_batch = filter_record_batch(&record_batch, &m.into())?; | |
} | |
Ok(record_batch) |
.map_ok(TryFrom::try_from) | ||
.flatten()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, map_ok
and flatten
are a bad combination -- Err
cases are silently dropped because they are treated as empty iterators. Does this work?
.map_ok(TryFrom::try_from) | |
.flatten()) | |
.map(|result| Ok(result?.try_into()?)) | |
.flatten_ok() |
(depending on the error types, you might be able to drop the Ok(...?)
wrapper)
(again below)
What changes are proposed in this pull request?
The PR introduces some convenience APIs for engines working with arrow data. Specifically we define and implement
ScanExt
andExpressionEvaluatorExt
which define variants of the main apis forScan
andExpressionEvaluator
respectively in terms of arrowRecordBatch
es.PR #621 contains some similar work in defining a convenience function to handle
Scan::execute
results. In this PR aTryFrom
impl is used - I was a bit unsure which approach would be better.see: #826
also includes one
cargo clippy
.This PR affects the following public APIs
new public methods when traits are in scope
Scan::scan_metadata_arrow
,Scan::evaluate_arrow
andExpressionEvaluator::evaluate_arrow
.How was this change tested?
additional unit tests for new APIs.