-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add with_nulls to PrimitiveArray, BooleanArray, and GenericByte… #9141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new with_nulls method to three array types (PrimitiveArray, BooleanArray, and GenericByteArray) that safely merges null buffers by computing the union of existing nulls with provided nulls. This addresses the common use case of applying validity masks without requiring unsafe APIs.
Changes:
- Implemented
with_nullsmethod forPrimitiveArraywith comprehensive documentation and examples - Implemented
with_nullsmethod forBooleanArraywith basic documentation - Implemented
with_nullsmethod forGenericByteArraywith basic documentation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| arrow-array/src/array/primitive_array.rs | Added with_nulls method with comprehensive documentation, examples, and panic documentation |
| arrow-array/src/array/byte_array.rs | Added with_nulls method for GenericByteArray with basic documentation |
| arrow-array/src/array/boolean_array.rs | Added with_nulls method with basic documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self { | ||
| if let Some(n) = &nulls { | ||
| assert_eq!(n.len(), self.len(), "Null buffer length mismatch"); | ||
| } | ||
|
|
||
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| data_type: T::DATA_TYPE, | ||
| value_offsets: self.value_offsets, | ||
| value_data: self.value_data, | ||
| nulls: new_nulls, | ||
| } | ||
| } |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:
- Merging nulls with an array that has no existing nulls
- Merging nulls with an array that already has nulls
- Passing None as the nulls parameter
- Verifying the panic behavior when null buffer length mismatches
Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.
| pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self { | ||
| if let Some(n) = &nulls { | ||
| assert_eq!(n.len(), self.len(), "Null buffer length mismatch"); | ||
| } | ||
|
|
||
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| values: self.values, | ||
| nulls: new_nulls, | ||
| } | ||
| } |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:
- Merging nulls with an array that has no existing nulls
- Merging nulls with an array that already has nulls
- Passing None as the nulls parameter
- Verifying the panic behavior when null buffer length mismatches
Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| data_type: T::DATA_TYPE, |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with PrimitiveArray::with_nulls and to future-proof the implementation, this should use self.data_type instead of T::DATA_TYPE. While they are currently always the same for GenericByteArray, PrimitiveArray supports overriding the data type via with_data_type, and using self.data_type would maintain the pattern of preserving the original instance's data type field.
| data_type: T::DATA_TYPE, | |
| data_type: self.data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Is this is a valid observation/suggestion? What do you think?
| }) | ||
| } | ||
|
|
||
| /// It returns a new array with the same data and a new null buffer. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.
| Self { values, nulls } | ||
| } | ||
|
|
||
| /// It returns a new array with the same data and a new null buffer. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.
| /// It returns a new array with the same data and a new null buffer. | |
| /// Returns a new array with the same data and a new null buffer. |
| /// The resulting null buffer is the union of the existing nulls and the provided nulls. | ||
| /// In other words, a slot is valid in the result only if it is valid in BOTH | ||
| /// the existing array AND the provided `nulls`. | ||
| /// |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for GenericByteArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.
| /// | |
| /// | |
| /// # Examples | |
| /// | |
| /// ``` | |
| /// use arrow_array::StringArray; | |
| /// use arrow_buffer::NullBuffer; | |
| /// | |
| /// // Create an array with an existing null in the second position | |
| /// let array = StringArray::from(vec![Some("a"), None, Some("c")]); | |
| /// | |
| /// // Create an additional null buffer to combine with the existing one | |
| /// // Here, the third position is marked as null | |
| /// let nulls = NullBuffer::from(vec![true, true, false]); | |
| /// | |
| /// let result = array.with_nulls(Some(nulls)); | |
| /// | |
| /// assert_eq!(result.len(), 3); | |
| /// // Still valid, as it is valid in both null buffers | |
| /// assert!(result.is_valid(0)); | |
| /// // Remains null, as it is null in the original array | |
| /// assert!(result.is_null(1)); | |
| /// // Now null, as it is null in the provided null buffer | |
| /// assert!(result.is_null(2)); | |
| /// ``` | |
| /// |
| /// The resulting null buffer is the union of the existing nulls and the provided nulls. | ||
| /// In other words, a slot is valid in the result only if it is valid in BOTH | ||
| /// the existing array AND the provided `nulls`. | ||
| /// |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for BooleanArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.
| /// | |
| /// | |
| /// # Example | |
| /// | |
| /// ``` | |
| /// # use arrow_array::{Array, BooleanArray}; | |
| /// # use arrow_buffer::NullBuffer; | |
| /// let array = BooleanArray::from(vec![true, false, true]); | |
| /// let nulls = NullBuffer::from(vec![true, false, true]); | |
| /// | |
| /// let array = array.with_nulls(Some(nulls)); | |
| /// | |
| /// assert_eq!(array.len(), 3); | |
| /// assert!(array.is_valid(0)); | |
| /// assert!(array.is_null(1)); | |
| /// ``` | |
| /// |
| pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self { | ||
| if let Some(n) = &nulls { | ||
| assert_eq!(n.len(), self.len(), "Null buffer length mismatch"); | ||
| } | ||
|
|
||
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| data_type: self.data_type, | ||
| values: self.values, | ||
| nulls: new_nulls, | ||
| } | ||
| } |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:
- Merging nulls with an array that has no existing nulls
- Merging nulls with an array that already has nulls
- Passing None as the nulls parameter
- Verifying the panic behavior when null buffer length mismatches
Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.
From @alamb (#6528 (comment)):
I had the sense he was not excited about that extra allocation and would prefer checked vs. unchecked versions of the API? (we anyway still have the panic risk of length mismatch, so the On the other hand, all the use cases I have encountered would explicitly use |
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| values: self.values, | ||
| nulls: new_nulls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the let is making line lengths and line counts worse, not better?
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | |
| Self { | |
| values: self.values, | |
| nulls: new_nulls, | |
| Self { | |
| values: self.values, | |
| nulls: NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()), |
(more below)
…Array
Which issue does this PR close?
NullBuffersfor Arrays #6528.Rationale for this change
I kept running into situations where I needed to update the null buffer of an array, usually to apply a filter or mask on top of existing nulls. Right now that forces you to drop down to unsafe APIs to rebuild the array from raw parts, which isn't ideal for such a common operation.
I wanted a safe API that handles this without risking undefined behavior. By defining the operation as a union of nulls (intersecting validity), we ensure that we only ever mark valid slots as null and never accidentally unmask garbage data. This makes it safe for all array types while covering the main use case of applying validity masks.
What changes are included in this PR?
I implemented
with_nullsforPrimitiveArray,BooleanArray, andGenericByteArray. The implementation relies onNullBuffer::unionto safely merge the new validity mask with the existing one.I also added documentation with examples for each implementation and made sure to document that it panics if the buffer lengths don't match.
Are these changes tested?
I verified the changes locally by running the existing tests for arrow-array.
Are there any user-facing changes?
This adds the public
with_nullsmethod to the array types I mentioned above.