Skip to content

Conversation

@keirsalterego
Copy link

…Array

Which issue does this PR close?

Rationale for this change

I kept running into situations where I needed to update the null buffer of an array, usually to apply a filter or mask on top of existing nulls. Right now that forces you to drop down to unsafe APIs to rebuild the array from raw parts, which isn't ideal for such a common operation.

I wanted a safe API that handles this without risking undefined behavior. By defining the operation as a union of nulls (intersecting validity), we ensure that we only ever mark valid slots as null and never accidentally unmask garbage data. This makes it safe for all array types while covering the main use case of applying validity masks.

What changes are included in this PR?

I implemented with_nulls for PrimitiveArray, BooleanArray, and GenericByteArray. The implementation relies on NullBuffer::union to safely merge the new validity mask with the existing one.

I also added documentation with examples for each implementation and made sure to document that it panics if the buffer lengths don't match.

Are these changes tested?

I verified the changes locally by running the existing tests for arrow-array.

Are there any user-facing changes?

This adds the public with_nulls method to the array types I mentioned above.

Copilot AI review requested due to automatic review settings January 11, 2026 15:48
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new with_nulls method to three array types (PrimitiveArray, BooleanArray, and GenericByteArray) that safely merges null buffers by computing the union of existing nulls with provided nulls. This addresses the common use case of applying validity masks without requiring unsafe APIs.

Changes:

  • Implemented with_nulls method for PrimitiveArray with comprehensive documentation and examples
  • Implemented with_nulls method for BooleanArray with basic documentation
  • Implemented with_nulls method for GenericByteArray with basic documentation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
arrow-array/src/array/primitive_array.rs Added with_nulls method with comprehensive documentation, examples, and panic documentation
arrow-array/src/array/byte_array.rs Added with_nulls method for GenericByteArray with basic documentation
arrow-array/src/array/boolean_array.rs Added with_nulls method with basic documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +179
pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
data_type: T::DATA_TYPE,
value_offsets: self.value_offsets,
value_data: self.value_data,
nulls: new_nulls,
}
}
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +116
pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
values: self.values,
nulls: new_nulls,
}
}
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.
let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
data_type: T::DATA_TYPE,
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with PrimitiveArray::with_nulls and to future-proof the implementation, this should use self.data_type instead of T::DATA_TYPE. While they are currently always the same for GenericByteArray, PrimitiveArray supports overriding the data type via with_data_type, and using self.data_type would maintain the pattern of preserving the original instance's data type field.

Suggested change
data_type: T::DATA_TYPE,
data_type: self.data_type,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Is this is a valid observation/suggestion? What do you think?

})
}

/// It returns a new array with the same data and a new null buffer.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.

Copilot uses AI. Check for mistakes.
Self { values, nulls }
}

/// It returns a new array with the same data and a new null buffer.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.

Suggested change
/// It returns a new array with the same data and a new null buffer.
/// Returns a new array with the same data and a new null buffer.

Copilot uses AI. Check for mistakes.
/// The resulting null buffer is the union of the existing nulls and the provided nulls.
/// In other words, a slot is valid in the result only if it is valid in BOTH
/// the existing array AND the provided `nulls`.
///
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for GenericByteArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.

Suggested change
///
///
/// # Examples
///
/// ```
/// use arrow_array::StringArray;
/// use arrow_buffer::NullBuffer;
///
/// // Create an array with an existing null in the second position
/// let array = StringArray::from(vec![Some("a"), None, Some("c")]);
///
/// // Create an additional null buffer to combine with the existing one
/// // Here, the third position is marked as null
/// let nulls = NullBuffer::from(vec![true, true, false]);
///
/// let result = array.with_nulls(Some(nulls));
///
/// assert_eq!(result.len(), 3);
/// // Still valid, as it is valid in both null buffers
/// assert!(result.is_valid(0));
/// // Remains null, as it is null in the original array
/// assert!(result.is_null(1));
/// // Now null, as it is null in the provided null buffer
/// assert!(result.is_null(2));
/// ```
///

Copilot uses AI. Check for mistakes.
/// The resulting null buffer is the union of the existing nulls and the provided nulls.
/// In other words, a slot is valid in the result only if it is valid in BOTH
/// the existing array AND the provided `nulls`.
///
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for BooleanArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.

Suggested change
///
///
/// # Example
///
/// ```
/// # use arrow_array::{Array, BooleanArray};
/// # use arrow_buffer::NullBuffer;
/// let array = BooleanArray::from(vec![true, false, true]);
/// let nulls = NullBuffer::from(vec![true, false, true]);
///
/// let array = array.with_nulls(Some(nulls));
///
/// assert_eq!(array.len(), 3);
/// assert!(array.is_valid(0));
/// assert!(array.is_null(1));
/// ```
///

Copilot uses AI. Check for mistakes.
Comment on lines +698 to +710
pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
data_type: self.data_type,
values: self.values,
nulls: new_nulls,
}
}
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.
@scovich
Copy link
Contributor

scovich commented Jan 12, 2026

I wanted a safe API that handles this without risking undefined behavior. By defining the operation as a union of nulls (intersecting validity), we ensure that we only ever mark valid slots as null and never accidentally unmask garbage data. This makes it safe for all array types while covering the main use case of applying validity masks.

From @alamb (#6528 (comment)):

has the same cost as detecting an error but without needing an error.

At the moment, such an API will require creating (and allocating) a new output array

I would like to eventually implement a way to reuse existing buffers for boolean arrays when possible (e.g. similar to binary_mut

I had the sense he was not excited about that extra allocation and would prefer checked vs. unchecked versions of the API? (we anyway still have the panic risk of length mismatch, so the with_nulls method isn't completely safe to use in its current form).

On the other hand, all the use cases I have encountered would explicitly use NullBuffer::union to combine the array's existing null mask with a source of additional nulls, so the "extra" allocation isn't necessarily extra. For example, if I were computing a nested null mask for a given struct field (in order to safely project it out of the parent struct), I might union all the parent null masks together, and then let field_array::with_nulls(parent_nulls) perform the final (necessary) union+alloc. But at that point, the method should potentially have a more precise name like with_additional_nulls?

Comment on lines +110 to +114
let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
values: self.values,
nulls: new_nulls,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the let is making line lengths and line counts worse, not better?

Suggested change
let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());
Self {
values: self.values,
nulls: new_nulls,
Self {
values: self.values,
nulls: NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()),

(more below)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Safe API to replace NullBuffers for Arrays

2 participants