Skip to content

GH-32276: [C++][FlightRPC] Add option to align RecordBatch buffers given to IPC reader (#44279)#6

Merged
EnricoMi merged 2 commits intoG-Research:20.0.0-grfrom
EnricoMi:20.0.0-gr-patch-2
May 23, 2025
Merged

GH-32276: [C++][FlightRPC] Add option to align RecordBatch buffers given to IPC reader (#44279)#6
EnricoMi merged 2 commits intoG-Research:20.0.0-grfrom
EnricoMi:20.0.0-gr-patch-2

Conversation

@EnricoMi
Copy link

Backports apache#44279 to 20.0.0-gr.

Rationale for this change

Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires data type-specific alignment, cannot handle such data: apache#43552. https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding

What changes are included in this PR?

This adds option arrow::ipc::IpcReadOptions.ensure_alignment of type arrow::ipc::Alignment to configure how RecordBatch array buffers decoded by IPC are realigned. It supports no realignment (default), data type-specific alignment and 64-byte alignment.

Implementation mirrors that of align_buffers in arrow-rs (apache/arrow-rs#4681).

Are these changes tested?

Configuration flag tested in unit test. Integration test with Flight server. Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in apache#43552.

Are there any user-facing changes?

Adds option IpcReadOptions.ensure_alignment and enum type Alignment.

Lead-authored-by: Enrico Minack github@enrico.minack.dev

Thanks for opening a pull request!

If this is your first pull request you can find detailed information on how to contribute here:

Please remove this line and the above text before creating your pull request.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

…ers given to IPC reader (apache#44279)

### Rationale for this change
Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires data type-specific alignment, cannot handle such data: apache#43552.
https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding

### What changes are included in this PR?
This adds option `arrow::ipc::IpcReadOptions.ensure_alignment` of type `arrow::ipc::Alignment` to configure how RecordBatch array buffers decoded by IPC are realigned. It supports no realignment (default), data type-specific alignment and 64-byte alignment.

Implementation mirrors that of [`align_buffers` in arrow-rs](https://github.com/apache/arrow-rs/blob/3293a8c2f9062fca93bee2210d540a1d25155bf5/arrow-data/src/data.rs#L698-L711) (apache/arrow-rs#4681).

### Are these changes tested?
Configuration flag tested in unit test. Integration test with Flight server.
Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in apache#43552.

### Are there any user-facing changes?
Adds option `IpcReadOptions.ensure_alignment` and enum type `Alignment`.

* GitHub Issue: apache#32276

Lead-authored-by: Enrico Minack <github@enrico.minack.dev>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: David Li <li.davidm96@gmail.com>
@github-actions
Copy link

❌ GitHub issue apache#32276 could not be retrieved.

@EnricoMi EnricoMi merged commit 601dc3d into G-Research:20.0.0-gr May 23, 2025
30 of 42 checks passed
@EnricoMi EnricoMi deleted the 20.0.0-gr-patch-2 branch May 23, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants