Skip to content

Conversation

rluvaton
Copy link
Member

@rluvaton rluvaton commented Oct 15, 2025

Which issue does this PR close?

Rationale for this change

Allowing to combine BooleanBuffers without a lot of copies and more (see issue)

What changes are included in this PR?

Created most of Buffer ops that exists in arrow-buffer/src/buffer/ops.rs for MutableBuffer and BooleanBufferBuilder
because we can't create BitChunksMut due to the reasons described below I had to port those to the mutable ops code

Implementation notes

Why there is a trait for MutableOpsBufferSupportedLhs and not getting MutableBuffer like the Buffer ops get Buffer

Because then we wouldn't be able to do an operation (e.g. AND) on a subset (e.g. from bit 10 to bit 100) of a BooleanBufferBuilder because BooleanBufferBuilder does not expose MutableBuffer and I don't want to expose it as the user could then add some values that will affect the BooleanBufferBuilder length without updating the length

Why there is a trait for BufferSupportedRhs and not getting Buffer like the Buffer ops get Buffer

Because we want to be able to do MutableBuffer & Buffer and also MutableBuffer & MutableBuffer

Why not creating BitChunksMut for MutableBuffer and making the code be like Buffer which is very simple ops

At first I thought of implementing BitChunksMut for MutableBuffer like and implement the ops the same way that it was implemented for Buffer but saw that it was impossible as:

  1. I might get a bit offset to do the op from that is between 2 u64 and I can't get a reference for that
  2. We read each u64 and convert them to little endian as bit-packed buffers are stored starting with the least-significant byte first.
  3. can't get mutable value for the remainder of the bytes (len % 64)

Are these changes tested?

Yes, although I did not run them on big endian machine

Are there any user-facing changes?

Yes, new functions which are documented


I will later change BooleanBufferBuilder#append_packed_range function to use mutable_bitwise_bin_op_helper as I saw that running the boolean_append_packed benchmark improved by 57%

boolean_append_packed   time:   [2.0079 µs 2.0139 µs 2.0202 µs]
                        change: [−57.808% −57.653% −57.494%] (p = 0.00 < 0.05)
                        Performance has improved.

…table.

but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add bitwise ops on BooleanBufferBuilder and MutableBuffer that mutate directly the buffer

1 participant