Skip to content

Commit ea47172

Browse files
authored
apacheGH-45254: [C++][Acero] Fix the row offset truncation in row table merge (apache#45255)
### Rationale for this change See apache#45254 ### What changes are included in this PR? First modify the test case to expose the suspecting bug. Then the fix in source. ### Are these changes tested? By existing tests. ### Are there any user-facing changes? None. * GitHub Issue: apache#45254 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
1 parent ef00568 commit ea47172

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

cpp/src/arrow/acero/hash_join_node_test.cc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3370,8 +3370,10 @@ TEST(HashJoin, LARGE_MEMORY_TEST(BuildSideOver4GBVarLength)) {
33703370
constexpr int value_no_match_length_min = 128;
33713371
constexpr int value_no_match_length_max = 129;
33723372
constexpr int value_match_length = 130;
3373+
// The value "DDD..." will be hashed to the partition over 4GB of the hash table.
3374+
// Matching at this area gives us more coverage.
33733375
const auto value_match =
3374-
std::make_shared<StringScalar>(std::string(value_match_length, 'X'));
3376+
std::make_shared<StringScalar>(std::string(value_match_length, 'D'));
33753377
constexpr int16_t num_rows_per_batch_left = 128;
33763378
constexpr int16_t num_rows_per_batch_right = 4096;
33773379
const int64_t num_batches_left = 8;

cpp/src/arrow/acero/swiss_join.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -439,11 +439,11 @@ Status RowArrayMerge::PrepareForMerge(RowArray* target,
439439
num_rows = 0;
440440
num_bytes = 0;
441441
for (size_t i = 0; i < sources.size(); ++i) {
442-
target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);
442+
target->rows_.mutable_offsets()[num_rows] = num_bytes;
443443
num_rows += sources[i]->rows_.length();
444444
num_bytes += sources[i]->rows_.offsets()[sources[i]->rows_.length()];
445445
}
446-
target->rows_.mutable_offsets()[num_rows] = static_cast<uint32_t>(num_bytes);
446+
target->rows_.mutable_offsets()[num_rows] = num_bytes;
447447
}
448448

449449
return Status::OK();

0 commit comments

Comments
 (0)