-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new create_one
ExpressionHandler API
#662
base: main
Are you sure you want to change the base?
Conversation
note I'll be cleaning up/adding more tests. wanted to get some eyes on this approach first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I want to consider/discuss about this approach is that we require that the expression struct heirarchy matches the schema one. So a schema Struct(Struct(Scalar(Int))
requires an expression Struct(Struct(Literal(int)))
. This code wouldn't allow a Literal(int)
expression.
Idk if we want to enforce that requirement in the long run? It's very common for kernel to flatten out the fields of a schema (ex: in a visitor), so I don't see why we shouldn't allow flattened expressions.
Perhaps this acts as a safety thing. Kernel is the only one calling create_one
, and it ensures that things are nested as we expected.
let actual_rb: RecordBatch = actual | ||
.into_any() | ||
.downcast::<ArrowEngineData>() | ||
.unwrap() | ||
.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth having these lines, handler, and the create_one call in a helper function. Something like:
fn test_create_one(schema, expr) -> RecordBatches {
let handler = ArrowExpressionHandler;
let actual = handler.create_one(schema, &expr).unwrap();
actual
.into_any()
.downcast::<ArrowEngineData>()
.unwrap()
.into()
}
match expr { | ||
// simple case: for literals, just create a single-row array and ensure the data types match | ||
Expression::Literal(scalar) => { | ||
let array = scalar.to_array(1)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lose the nullability info in field.is_nullable()
. Should we add a check:
if !field.is_nullable() && array.is_null(0) {
// return error
}
#[test] | ||
fn test_create_one_string() { | ||
let expr = Expression::struct_from([Expression::literal("a")]); | ||
let schema = Arc::new(crate::schema::StructType::new([StructField::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we use schema::StructType
several times, I think we should import it.
Expression::literal(3), | ||
]); | ||
let schema = Arc::new(crate::schema::StructType::new([ | ||
StructField::new("a", DeltaDataTypes::INTEGER, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a rebase, I think these can be moved over to StructField::nullable
/StructField::not_null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick pass, couple reactions
(overall approach looks good)
let scalar = &self.scalars[self.next_scalar_idx]; | ||
self.next_scalar_idx += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a type check?
let DataType::Primitive(scalar_type) = scalar.data_type() else { /* boom */ };
require!(scalar_type == ptype, /* boom */);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
} | ||
|
||
fn transform_struct_field(&mut self, field: &'a StructField) -> Option<Cow<'a, StructField>> { | ||
self.recurse_into_struct_field(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch out -- struct field (or containing array/map) decides whether the child type is nullable
(any nullable ancestor makes the field nullable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to add this - will get to it soon
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #662 +/- ##
==========================================
+ Coverage 84.22% 84.28% +0.06%
==========================================
Files 77 77
Lines 17694 17959 +265
Branches 17694 17959 +265
==========================================
+ Hits 14902 15136 +234
- Misses 2080 2113 +33
+ Partials 712 710 -2 ☔ View full report in Codecov by Sentry. |
What changes are proposed in this pull request?
Adds a new
create_one
API for creating single-rowEngineData
by implementing aSchemaTransform
to transform the given schema + leaf values into a single-rowArrowEngineData
fn create_one
to ourExpressionHandler
trait (breaking)create_one
forArrowExpressionHandler
This PR affects the following public APIs
New
create_one
API required for ExpressionHandler. And added a newlen()
method toStructType
.How was this change tested?
Bunch of new unit tests.