Skip to content

Commit 25fa27f

Browse files
authored
fix: preserve null bitmap in nested transform expressions (#1645)
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1645/files) to review incremental changes. - [**stack/null-propagation**](#1645) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1645/files)] - [stack/coalesce](#1648) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1648/files/37e755009566511bf7c2f00e014c1647e77e4533..d64042f7908844ef2d8a1c68312dc3ff936d60dc)] - [stack/checkpoint-transforms](#1646) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1646/files/d64042f7908844ef2d8a1c68312dc3ff936d60dc..4e66ca004f89b23431a96ac106a9c0d400718b10)] - [stack/write-stats](#1643) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1643/files/4e66ca004f89b23431a96ac106a9c0d400718b10..cd64f79fd3b40ebfa811cb333369cb17aa1a2a74)] --------- ## What changes are proposed in this pull request? Fixes a bug in nested transform expression evaluation where null rows in the source struct were losing their null bitmap, causing null structs to incorrectly appear as non-null structs with null fields. When evaluating nested transform expressions (transforms with an input_path that operate on a nested struct), the output StructArray was created with None for the null buffer: `let data = StructArray::try_new(output_fields.into(), output_cols, None)?;` This meant that if the source struct had null rows (e.g., an add action that is null in a checkpoint batch), the output would lose that null information. The struct would appear as non-null but with all-null fields, which is semantically different. ## How was this change tested? Existing transform tests pass. The stats transform integration tests (in a follow-up PR) exercise this code path.
1 parent daa9a1e commit 25fa27f

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

kernel/src/engine/arrow_expression/evaluate_expression.rs

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -159,12 +159,19 @@ fn evaluate_transform_expression(
159159
}
160160

161161
// Extract the input path, if any
162-
let source_data = transform
162+
let source_array = transform
163163
.input_path()
164164
.map(|path| extract_column(batch, path))
165165
.transpose()?;
166166

167-
let source_data: &dyn ProvidesColumnByName = match source_data {
167+
// For nested transforms, get the source struct's null bitmap to preserve null rows
168+
let source_null_buffer = source_array.as_ref().and_then(|arr| {
169+
arr.as_any()
170+
.downcast_ref::<StructArray>()
171+
.and_then(|s| s.nulls().cloned())
172+
});
173+
174+
let source_data: &dyn ProvidesColumnByName = match source_array {
168175
Some(ref array) => array
169176
.as_any()
170177
.downcast_ref::<StructArray>()
@@ -204,7 +211,7 @@ fn evaluate_transform_expression(
204211
return Err(Error::generic("Too many fields in output schema"));
205212
}
206213

207-
// Build the final struct
214+
// Build the final struct, preserving null bitmap for nested transforms
208215
let output_fields: Vec<ArrowField> = output_cols
209216
.iter()
210217
.zip(output_schema.fields())
@@ -216,7 +223,7 @@ fn evaluate_transform_expression(
216223
)
217224
})
218225
.collect();
219-
let data = StructArray::try_new(output_fields.into(), output_cols, None)?;
226+
let data = StructArray::try_new(output_fields.into(), output_cols, source_null_buffer)?;
220227
Ok(Arc::new(data))
221228
}
222229

0 commit comments

Comments
 (0)