Open
Description
Describe the bug
The arrow expression evaluator for the default and sync engine does not handle non-nullable fields in a nullable struct. If all the (possibly non-nullable) fields of a struct are null, and the top level struct is nullable, the expression evaluator must allow the output struct to also be null.
Consider the schema
let nested = StructType::new([StructField::new("path", DeltaDataTypes::STRING, false)]);
let schema = StructType::new([StructField::new("add", nested, true)]);
For engine data
[Some("fake_path"), None, Some("other_fake_path")]);
I expect the following:
[ Add { path: "fake_path" }, null, Add { path: "other_fake_path" }]
To Reproduce
I created a MRE test that can be run in arrow_expression.rs
. I have a top level nullable struct add
with a non-nullable field path
.
#[test]
fn test_nested_nullability() {
// Arrow Schema
let field = Field::new("path", DataType::Utf8, false);
let top = Arc::new(Field::new(
"add",
DataType::Struct(Fields::from(vec![field.clone()])),
true,
));
let schema = Schema::new([top]);
// Arrow data
let values = StringArray::from(vec![Some("fake_path"), None, Some("other_fake_path")]);
let struct_values: ArrayRef = Arc::new(values);
let struct_array = StructArray::from(vec![(Arc::new(field), struct_values.clone())]);
let batch =
RecordBatch::try_new(Arc::new(schema), vec![Arc::new(struct_array.clone())]).unwrap();
// Delta Schema
let nested = StructType::new([StructField::new("path", DeltaDataTypes::STRING, false)]);
let schema = StructType::new([StructField::new("add", nested, true)]);
let expression = Expression::struct_from([column_expr!("add.path")]);
let evaluator = DefaultExpressionEvaluator {
input_schema: schema.clone().into(),
expression: Box::new(expression),
output_type: schema.into(),
};
let data = ArrowEngineData::new(batch);
evaluator.evaluate(&data).unwrap();
}
Expected behavior
I expect the test to successfully transform the data to something like this:
[ Add { path: "fake_path" }, null, Add { path: "other_fake_path" }]
Additional context
This was caught when working on CDF scan file transformation and schema.
Metadata
Metadata
Assignees
Labels
No labels