Skip to content

Add BinaryFormatSupport and Row Encoder to arrow-avro Writer#9171

Merged
alamb merged 9 commits intoapache:mainfrom
jecsand838:avro-row-encoder
Jan 27, 2026
Merged

Add BinaryFormatSupport and Row Encoder to arrow-avro Writer#9171
alamb merged 9 commits intoapache:mainfrom
jecsand838:avro-row-encoder

Conversation

@jecsand838
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

arrow-avro already supports writing Avro Object Container Files (OCF) and framed streaming encodings (e.g. Single-Object Encoding / registry wire formats). However, many systems exchange raw Avro binary datum payloads (i.e. only the Avro record body bytes) while supplying the schema out-of-band (configuration, RPC contract, topic metadata, etc.).

Without first-class support for unframed datum output, users must either:

  • accept framing overhead that downstream systems don’t expect, or
  • re-implement datum encoding themselves.

This PR adds the missing unframed write path and exposes a row-by-row encoding API to make it easy to embed Avro datums into other transport protocols.

What changes are included in this PR?

  • Added AvroBinaryFormat (unframed) as an AvroFormat implementation to emit raw Avro record body bytes (no SOE prefix and no OCF header) and to explicitly reject container-level compression for this format.
  • Added RecordEncoder::encode_rows to encode a RecordBatch into a single contiguous buffer while tracking per-row boundaries via appended offsets.
  • Introduced a higher-level Encoder + EncodedRows API for row-by-row streaming use cases, providing zero-copy access to individual row slices (via Bytes).
  • Updated the writer API to provide build_encoder for stream formats (e.g. SOE) and added row-capacity configuration to better support incremental/streaming workflows.
  • Added the bytes crate as a dependency to support efficient buffering and slicing in the row encoder, and adjusted dev-dependencies to support the new tests/docs.

Are these changes tested?

Yes.

This PR adds unit tests that cover:

  • single- and multi-column row encoding
  • nullable columns
  • prefix-based vs. unprefixed row encoding behavior
  • empty batch encoding
  • appending to existing output buffers and validating offset invariants

Are there any user-facing changes?

Yes, these changes are additive (no breaking public API changes expected).

  • New writer format support for unframed Avro binary datum output (AvroBinaryFormat).
  • New row-by-row encoding APIs (RecordEncoder::encode_rows, Encoder, EncodedRows) to support zero-copy access to per-row encoded bytes.
  • New WriterBuilder functionality (build_encoder + row-capacity configuration) to enable encoder construction without committing to a specific Write sink.

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Jan 14, 2026
- Introduced `RecordEncoder::encode_rows` to buffer encoded rows as contiguous slices with per-row offsets using `BytesMut`.
- Added `Encoder` for row-by-row Avro encoding, including zero-copy `Bytes` row access via `EncodedRows`.
- Integrated `bytes` crate for efficient encoding operations.
- Updated writer API to offer `build_encoder` for stream formats (e.g., SOE) alongside row-capacity configuration support.
- Adjusted docs to highlight new encoder capabilities.
- Comprehensive tests added to validate single/multi-column, nullable, prefix-based, and empty batch encoding scenarios.
@jecsand838 jecsand838 changed the title Add BinaryFormatSupport to arrow-avro Writer Add BinaryFormatSupport and Row Encoder to arrow-avro Writer Jan 14, 2026
@jecsand838
Copy link
Contributor Author

@mbrobbel @alamb @scovich @nathaniel-d-ef

Would any of you have bandwidth to review this PR? Much of the diff is comments and tests. I was hoping to get this out in the v58.0.0 release. This is also rather pivotal for the future direction of the arrow-avro Writer, so I'd absolutely love feedback regarding the row-wise Encoder architecture.

@jecsand838 jecsand838 force-pushed the avro-row-encoder branch 2 times, most recently from 14bc1ae to 5ded4c0 Compare January 16, 2026 00:29
Copy link
Contributor

@nathaniel-d-ef nathaniel-d-ef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually this looks solid to me, though I'll defer to those with more advanced Rust knowledge to get in the weeds on performance. This should quite valuable to systems that just need the bytes - good work 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @jecsand838

I have some small additional test suggestions

And some API suggestions / questions, but nothing I think is necessary before merge

Let me know how you would like to proceed

// self.len() is defined as self.offsets.len().saturating_sub(1).
// The check `i >= self.len()` above ensures that `i < self.offsets.len() - 1`.
// Therefore, both `i` and `i + 1` are strictly within the bounds of `self.offsets`.
let (start_u64, end_u64) = unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

die you see this use of unsafe make a difference in benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see a difference surprisingly.

In the screenshot below I run the benchmarks first with the unsafe code before changing the production code to be safe and re-running. There seemed to be a significant performance impact.

NOTE: For the safe test I used let (start_u64, end_u64) = (self.offsets[i], self.offsets[i + 1]);.

I made sure to push up the benches I used for this in a new benches/encoder.rs file, which can be expanded on in future PRs.

Screenshot 2026-01-23 at 10 11 29 PM

/// # }
/// ```
pub fn rows(&self) -> impl Iterator<Item = Result<Bytes, ArrowError>> + '_ {
(0..self.len()).map(|i| self.row(i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely more efficient if you returned the sliced Bytes directly -- calling row will continually recheck len, for example

You could do something like this to get known good iffsets

self.offsets.iter().windows(2).map(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great call out. I went ahead and implemented those changes and renamed the method from rows to iter which seemed more idiomatic.

Comment on lines 338 to 344
pub fn to_vecs(&self) -> Result<Vec<Vec<u8>>, ArrowError> {
let mut out = Vec::with_capacity(self.len());
for i in 0..self.len() {
out.push(self.row(i)?.to_vec());
}
Ok(out)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unnecessary API to me -- you could do it the same with

let vecs: Vec<_> = rows.iter().map(|v| v.to_vec()).collect()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% great catch. I was overthinking this. Ended up removing to_vecs in my latest push and updated the documentation / examples to better showcase this.

}
}

/// A row-by-row streaming encoder for Avro **Single Object Encoding** (SOE) streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why a user couldn't just use Writer with a mut Vec as the the sink - you would get the same effect

Is the difference that you get the output offsets as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! At the byte level Writer<_, AvroSoeFormat> writing into a Vec<u8> does produce the same concatenated output stream.

The reason for Encoder however is that neither SOE nor the Confluent/Apicurio wire formats include a length field (SOE is just 0xC3 0x01 + 8-byte hashed fingerprint + body while Confluent is magic byte + 4-byte schema id + body). So once multiple rows are written into a single Vec, there’s no cheap or 100% reliable--especially for wire formats--way to split it back into per-row payloads without either decoding or getting hacky. Support for binary format was essentially blocked since those payloads aren't framed at all and therefore have no makeshift delimiter to scan for / split by.

Additionally, I hit performance bottlenecks when developing message-oriented sinks (Kafka/Pulsar/etc.) downstream of arrow-avro. These were incurred from having to use the Writer to encode 1-row batches and tracking Vec lengths, which is much less efficient due to repeated per-call setups and per-row allocations + copies.

The new Encoder solves this while enabling binary format by recording row-end offsets during encoding and returning zero-copy Bytes slices per row (via EncodedRows).

Add additional test coverage
@jecsand838
Copy link
Contributor Author

Looks good to me @jecsand838

I have some small additional test suggestions

And some API suggestions / questions, but nothing I think is necessary before merge

Let me know how you would like to proceed

@alamb Thank you so much for the review and for the tests!

I ended up merging your PR in and pushing up some changes to address the comments you left. I think your recommendations were solid and worth getting in now. Also I left some answers to your questions over the design.

Let me know what you think when you get a chance.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @jecsand838

let (start_u64, end_u64) = unsafe {
// The check `n >= self.len()` above ensures that `n < self.offsets.len() - 1`.
// Therefore, both `n` and `n + 1` are strictly within the bounds of `self.offsets`.
let (start, end) = unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using usize rather than u64 seems like a nice cleaup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%, that became apparent to me rather quickly lol.

pub fn iter(&self) -> impl ExactSizeIterator<Item = Bytes> + '_ {
self.offsets.windows(2).map(|w| {
debug_assert!(w[0] <= w[1] && w[1] <= self.data.len());
self.data.slice(w[0]..w[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given you are using slice here I suspect the extra debug assert is not necessary as the slice also does the same check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are correct. I went ahead and removed the extra debug assert.

@alamb
Copy link
Contributor

alamb commented Jan 26, 2026

Sorry -- this now has a conflict (likely due to the new AvroError)

@jecsand838
Copy link
Contributor Author

Sorry -- this now has a conflict (likely due to the new AvroError)

@alamb No worries! I just pushed up the changes to resolve the conflicts and use AvroError.

@alamb alamb merged commit fab8e75 into apache:main Jan 27, 2026
24 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 27, 2026

🚀

@jecsand838 jecsand838 deleted the avro-row-encoder branch January 27, 2026 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[arrow-avro] Add Avro BinaryFormat (Unframed) to writer module

3 participants