-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concat
performance, and add append_array
for some array builder implementations
#7309
Conversation
append_array
for PrimitiveBuilder
append_array
for some array builder implementations
What is the motivation for adding these here, as opposed to just using the array constructors to implement concat? IMO if you aren't constructing by value, there's no point using the builders? |
The motivation is:
|
I've pushed the concat updated implementation so you can run the benchmarks locally, for me:
|
This seems like quite a fair bit of code and codegen to shave off literal nanoseconds, no? Do we see similar returns for the byte array types? Also as this is really just reducing the dispatch overheads, I would expect as the array sizes increase the return would largely disappear. I dunno I am always pretty wary of adding new unsafe APIs... |
# Conflicts: # arrow-buffer/src/builder/null.rs
I am curious what you mean by this / what you have in mind Specifically, there is another PR that seems to be related to this type of operation Also I have a writeup of a more specialized concat / take type operation that might be related |
If I want to implement
|
append_array
for some array builder implementationsconcat
performance, and add append_array
for some array builder implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- I think this is looking quite good so far (I didn't look at previous versions of this PR)
I have a few test suggestions, but otherwise I think all that is needed are a few more benchmarks and a run showing the performance benefits. I am happy to run such benchmarks if you could make a new PR with just the benchmarks
I think this is going to be pretty sweet
// Creating intermediate offsets instead of pushing each offset is faster | ||
// (even if we make MutableBuffer to avoid updating length on each push | ||
// and reserve the necessary capacity, it's still slower) | ||
let mut intermediate = Vec::with_capacity(offsets.len() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 maybe we need something like extend_from_iter
in offsets builder (not for this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many of these instances could be changed to use Vec
directly (and use optimized extend, etc. from them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed follow on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- I think this PR looks really nice now -- well tested and benchmarked 🏆 🦾
@Dandandan's suggestion here is probably worth trying: https://github.com/apache/arrow-rs/pull/7309/files#r2023700374 (I will file a follow on issue)
I also ran the benchmarks and this PR shows significant improvements (as expected):
++ critcmp main add_append_array_builder
group add_append_array_builder main
----- ------------------------ ----
concat 1024 arrays i32 4 1.00 14.9±0.04µs ? ?/sec 13.85 206.4±0.25µs ? ?/sec
concat fixed size lists 1.00 706.7±10.18µs ? ?/sec 1.09 767.3±12.22µs ? ?/sec
concat i32 1024 1.00 440.9±3.06ns ? ?/sec 1.79 788.1±9.00ns ? ?/sec
concat i32 nulls 1024 1.00 785.8±4.06ns ? ?/sec 1.69 1328.5±2.27ns ? ?/sec
concat str 1024 1.00 13.8±1.41µs ? ?/sec 1.15 15.9±0.90µs ? ?/sec
concat str nulls 1024 1.00 6.9±0.57µs ? ?/sec 1.36 9.4±0.35µs ? ?/sec
concat str_dict 1024 1.02 2.9±0.01µs ? ?/sec 1.00 2.9±0.00µs ? ?/sec
concat str_dict_sparse 1024 1.00 7.0±0.01µs ? ?/sec 1.00 7.0±0.02µs ? ?/sec
Thanks again @rluvaton |
concat
#7357This is on top of
append_buffer
forNullBufferBuilder
#7308Which issue does this PR close?
N/A
Rationale for this change
This is a building block for implementing specialized concat
What changes are included in this PR?
added
append_array
function for:PrimitiveBuilder
,GenericByteBuilder
andBooleanBuilder
with testsconcat
to use this specialized implementationAre there any user-facing changes?
Yes, new method.
Local benchmark results: