Skip to content

GH-46677: [C++] Expose an BinaryViewBuilder interface for append a binary and multiple subslice #46730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

IndifferentArea
Copy link
Contributor

@IndifferentArea IndifferentArea commented Jun 6, 2025

Rationale for this change

see #46677

What changes are included in this PR?

see #46677

Are these changes tested?

Yes

Are there any user-facing changes?

No

@IndifferentArea
Copy link
Contributor Author

@mapleFU is currently implemented interface expected?

return AppendBlock(value.data(), static_cast<int64_t>(value.size()));
}

Status AppendViewFromBuffer(int32_t buffer_id, int32_t buffer_offset, int32_t start,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: from buffer or from block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally both is ok for me, I prefer Buffer since a variable is buffer_index

@@ -645,6 +657,28 @@ class ARROW_EXPORT BinaryViewBuilder : public ArrayBuilder {
UnsafeAppend(value.data(), static_cast<int64_t>(value.size()));
}

Result<std::pair<int32_t, int32_t>> AppendBlock(const uint8_t* value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use more specific name rather than pair<i32, i32>?

Copy link
Contributor Author

@IndifferentArea IndifferentArea Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can i directly use BinaryViewType::c_type since it already contains these two info we need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax is a bit weird here? Append a BinaryView and then append the sub-slice of the view?

@@ -100,6 +100,13 @@ void BinaryViewBuilder::Reset() {
data_heap_builder_.Reset();
}

Result<std::pair<int32_t, int32_t>> BinaryViewBuilder::AppendBlock(const uint8_t* value,
const int64_t length) {
DCHECK_GT(length, TypeClass::kInlineSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If length <= kInlineSize, should this return false or ok? Why just DCHECK here?

c_type GetViewFromBlock(int32_t block_id, int32_t block_offset, int32_t offset,
int32_t length) const {
const auto* value = blocks_.at(block_id)->data_as<uint8_t>() + block_offset + offset;
if (length <= BinaryViewType::kInlineSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses ToBinaryView?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 7, 2025
@IndifferentArea
Copy link
Contributor Author

IndifferentArea commented Jun 7, 2025

Should we rename AppendBuffer or redesign the interface's semantics? since currently we don't really append a buffer/block, we will directly append to the last one if remaining size is enough.. I think It may introduce confusion.

Maybe aligning with arrow-rs's impl is fine..

@mapleFU
Copy link
Member

mapleFU commented Jun 7, 2025

Some personal thoughts:

  1. AppendBuffer ( and etc ) which returns a StringView is a bit weird, Block is not a StringView
  2. Now aligned with arrow-rs also a good way

@IndifferentArea
Copy link
Contributor Author

Not sure why these 2 ci always failed..

@IndifferentArea IndifferentArea marked this pull request as ready for review June 8, 2025 12:44
/// let array = builder.finish();
ASSERT_OK_AND_ASSIGN(const auto buffer,
src_builder.AppendBuffer("helloworldbingobongo"));
ASSERT_OK(src_builder.AppendViewFromBuffer(buffer, 0, 5));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try append unexists buffer_index?


// Verify the content of the resulting array
ASSERT_EQ(src->length(), 6);
const auto& binary_view_array = static_cast<const BinaryViewArray&>(*src);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we both test stringView and binaryView?

ASSERT_OK(src_builder.AppendViewFromBuffer(buffer, 10, 5));
ASSERT_OK(src_builder.AppendViewFromBuffer(buffer, 15, 5));
ASSERT_OK(src_builder.AppendViewFromBuffer(buffer, 0, 15));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test append multiple buffers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants