Add `merge` and `merge_n` algorithms #8753

pepijnve · 2025-10-31T09:22:57Z

Which issue does this PR close?

Closes Provide algorithm that allows zipping arrays whose values are not prealigned #8752.

Rationale for this change

The algorithms suggested in this PR originate from the case logic in DataFusion (see datafusion#18152 and datafusion#18444). I think it might be useful to move them to arrow-rs instead of being tucked away in a corner of the DataFusion codebase.

What changes are included in this PR?

Adds a two-way and n-way merge algorithm that's halfway between zip and interleave. In contrast to zip the truthy and falsy arrays do not need to be prealigned. In contrast to interleave the relative order of elements in each input array is retained in the final result.

Are these changes tested?

I've already added two minimal unit tests, more should probably be added.

Are there any user-facing changes?

No breaking API changes

pepijnve · 2025-10-31T09:27:23Z

The optimisation work that was done in #8653 would make sense here as well. That has not been done yet.

alamb

Thanks @pepijnve -- what do you think about also adding benchmarks to this kernel (so that future optimizations work better)

pepijnve · 2025-10-31T13:42:35Z

what do you think about also adding benchmarks to this kernel

Good idea. I’m happy to continue working on this one. I created the PR already to get the ball rolling and solicit input from other devs.

pepijnve · 2025-11-02T09:36:34Z

The optimisation work that was done in #8653 would make sense here as well. That has not been done yet.

While looking into this I realised that merge on scalars is effectively identical to zip so I resolved this by delegating to zip in case of scalar input

pepijnve · 2025-11-02T09:38:45Z

what do you think about also adding benchmarks to this kernel

@alamb I duplicated the microbenchmark for zip as a quick fix. Is it worth trying to actually share the sets of input data and masks? If so, where should I move that code?

… obvious

Add merge and merge_n algorithms

eab6202

github-actions bot added the arrow Changes to the arrow crate label Oct 31, 2025

Add license header

462cd3e

pepijnve added 3 commits October 31, 2025 10:31

Formatting and clippy

eefc171

Remove unused import

dc7602a

Fix doc links

8068238

alamb reviewed Oct 31, 2025

View reviewed changes

pepijnve added 3 commits November 2, 2025 10:26

Delegate to zip when both truthy and falsy are scalar

fd3105c

Add merge to compute kernels list

66c8fa0

Duplicate zip benchmark for merge

4286c72

pepijnve added 4 commits November 2, 2025 10:39

Formatting

1d947df

Documentation link fixes

59a733a

Documentation link fixes

10af559

Documentation link fixes

ac68821

pepijnve mentioned this pull request Nov 3, 2025

Avoid scatter operation in ExpressionOrExpression case evaluation method apache/datafusion#18444

Merged

Update example diagram for merge to make difference with zip more…

347e3df

… obvious

alamb mentioned this pull request Nov 4, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-03 apache/datafusion#18486

Open

34 tasks

Fix clippy warning

9bb40cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `merge` and `merge_n` algorithms #8753

Add `merge` and `merge_n` algorithms #8753

Uh oh!

pepijnve commented Oct 31, 2025 •

edited

Loading

Uh oh!

pepijnve commented Oct 31, 2025 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

pepijnve commented Oct 31, 2025

Uh oh!

pepijnve commented Nov 2, 2025

Uh oh!

pepijnve commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add merge and merge_n algorithms #8753

Are you sure you want to change the base?

Add merge and merge_n algorithms #8753

Uh oh!

Conversation

pepijnve commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pepijnve commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

pepijnve commented Oct 31, 2025

Uh oh!

pepijnve commented Nov 2, 2025

Uh oh!

pepijnve commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `merge` and `merge_n` algorithms #8753

Add `merge` and `merge_n` algorithms #8753

pepijnve commented Oct 31, 2025 •

edited

Loading

pepijnve commented Oct 31, 2025 •

edited

Loading