Adding `try_append_value` implementation to `ByteViewBuilder` #8594

samueleresca · 2025-10-12T19:38:35Z

Which issue does this PR close?

Partial fix for concat_elements_utf8view panics with large buffer on 64bit machines datafusion#17857

Rationale for this change

These changes add a safer version of append_value in ByteViewBuilder that handles panics called try_append_value. Datafusions will consume the API and handle the Result coming back from the function.

What changes are included in this PR?

Are these changes tested?

The method is already covered by existing tests.

Are there any user-facing changes?

No breaking changes, as the original append_value method hasn't changed.

ctsk · 2025-10-14T06:05:51Z

arrow-array/src/builder/generic_bytes_view_builder.rs

+            .map(u32::from_le_bytes)
+            .ok_or_else(|| {
+                ArrowError::InvalidArgumentError(
+                    "String must be at least 4 bytes for non-inline view".to_string(),


This error is unreachable as we checked that the value is longer than MAX_INLINE_VIEW_LEN (12 bytes) above.

ctsk · 2025-10-14T06:07:25Z

arrow-array/src/builder/generic_bytes_view_builder.rs

-        let offset = self.in_progress.len() as u32;
+        let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
+            ArrowError::InvalidArgumentError(format!(
+                "In-progress buffer length {} exceeds u32::MAX",


I think the method can recover by starting a new in-progress buffer instead of returning an error here.

I am unsure if this error is even reachable.

I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

alamb

Thanks for this @samueleresca

alamb · 2025-10-14T20:23:41Z

arrow-array/src/builder/generic_bytes_view_builder.rs

-        let offset = self.in_progress.len() as u32;
+        let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
+            ArrowError::InvalidArgumentError(format!(
+                "In-progress buffer length {} exceeds u32::MAX",


I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

github-actions bot added the arrow Changes to the arrow crate label Oct 12, 2025

Adding try_append_value implementation.

8859ff7

samueleresca force-pushed the safer-appendvalue-bytes-view branch from 73faf99 to 8859ff7 Compare October 12, 2025 19:45

samueleresca marked this pull request as ready for review October 13, 2025 21:03

ctsk reviewed Oct 14, 2025

View reviewed changes

alamb reviewed Oct 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding `try_append_value` implementation to `ByteViewBuilder` #8594

Adding `try_append_value` implementation to `ByteViewBuilder` #8594

Uh oh!

samueleresca commented Oct 12, 2025 •

edited

Loading

Uh oh!

ctsk Oct 14, 2025

Uh oh!

ctsk Oct 14, 2025

Uh oh!

alamb Oct 14, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding try_append_value implementation to ByteViewBuilder #8594

Are you sure you want to change the base?

Adding try_append_value implementation to ByteViewBuilder #8594

Uh oh!

Conversation

samueleresca commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

ctsk Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ctsk Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding `try_append_value` implementation to `ByteViewBuilder` #8594

Adding `try_append_value` implementation to `ByteViewBuilder` #8594

samueleresca commented Oct 12, 2025 •

edited

Loading