Skip to content

Conversation

samueleresca
Copy link
Member

@samueleresca samueleresca commented Oct 12, 2025

Which issue does this PR close?

Rationale for this change

These changes add a safer version of append_value in ByteViewBuilder that handles panics called try_append_value. Datafusions will consume the API and handle the Result coming back from the function.

What changes are included in this PR?

Are these changes tested?

The method is already covered by existing tests.

Are there any user-facing changes?

No breaking changes, as the original append_value method hasn't changed.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 12, 2025
@samueleresca samueleresca force-pushed the safer-appendvalue-bytes-view branch from 73faf99 to 8859ff7 Compare October 12, 2025 19:45
@samueleresca samueleresca marked this pull request as ready for review October 13, 2025 21:03
.map(u32::from_le_bytes)
.ok_or_else(|| {
ArrowError::InvalidArgumentError(
"String must be at least 4 bytes for non-inline view".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is unreachable as we checked that the value is longer than MAX_INLINE_VIEW_LEN (12 bytes) above.

let offset = self.in_progress.len() as u32;
let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
ArrowError::InvalidArgumentError(format!(
"In-progress buffer length {} exceeds u32::MAX",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think the method can recover by starting a new in-progress buffer instead of returning an error here.

  2. I am unsure if this error is even reachable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @samueleresca

let offset = self.in_progress.len() as u32;
let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
ArrowError::InvalidArgumentError(format!(
"In-progress buffer length {} exceeds u32::MAX",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants