-
Notifications
You must be signed in to change notification settings - Fork 3.8k
GH-38558: [C++] Add support for null sort option per sort key #46926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1.Reconstruct the SortKey structure and add NullPlacement. 2.Remove NullPlacement from SortOptions 3.Fix selectk not displaying non-empty results in null AtEnd scenario. When limit k is greater than the actual table data and the table contains Null/NaN, the data cannot be obtained and only non-empty results are available. Therefore, we support returning non-null and supporting the order of setting Null for each SortKey. 4.Add relevant unit tests and change the interface implemented by multiple versions
…8558 # Conflicts: # c_glib/arrow-glib/compute.cpp # c_glib/arrow-glib/compute.h # cpp/src/arrow/compute/kernels/vector_rank.cc # cpp/src/arrow/compute/kernels/vector_select_k.cc # cpp/src/arrow/compute/kernels/vector_sort.cc # cpp/src/arrow/compute/kernels/vector_sort_internal.h # python/pyarrow/_acero.pyx # python/pyarrow/_compute.pyx # python/pyarrow/array.pxi # python/pyarrow/tests/test_compute.py # python/pyarrow/tests/test_table.py
# Conflicts: # cpp/src/arrow/compute/api_vector.cc # cpp/src/arrow/compute/api_vector.h # cpp/src/arrow/compute/kernels/vector_rank.cc # cpp/src/arrow/compute/kernels/vector_select_k.cc # cpp/src/arrow/compute/kernels/vector_sort.cc # cpp/src/arrow/compute/kernels/vector_sort_internal.h # cpp/src/arrow/compute/kernels/vector_sort_test.cc # cpp/src/arrow/compute/ordering.cc # cpp/src/arrow/compute/ordering.h
…most likely human-error while merging)
Note that I fixed the failing CI runs on my fork |
Hi @Taepper , thanks for submitting this. Design-wise, I think there are two possible APIs here:
Option 1 has the advantage that it's conceptually simpler once the deprecation period is over, but it comes with a minor API change and a slightly complicated deprecation period. Option 2 is conceptually a bit more complicated (per-key + global fallback) but avoids breaking the current API and doesn't introduce any deprecation. I'm not sure which one is better. @zanmato1984 @felipecrv Thoughts? |
Thank you for your comments! I had the same considerations. I opted for option 1 because it provides the clear path forward for the post-deprecation API. Maybe option 2 is worth more consideration for the missing API breakage. Should the |
Well, the API breakage isn't critical either IMHO. I'd just like to have opinions from other core developers. @lidavidm Perhaps? |
As long as we're still doing major releases, then I think the slight breakage is acceptable in return for a nicer API in the end. |
Alright, also note that this tries to be analogous to the deprecation of |
That's a good comparison point, thank you. |
I prefer a conceptually simpler model and sacrifice the API compatibility:
|
Ok, it seems everyone agrees with this approach, so let's go for it. |
See #38584 for original PR. Will be quoted for this PR description.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
I amended the original PR to be less breaking in public APIs.
Still Ordering, SortOptions, RankOptions, and RankQuantileOptions now accept a
std::optional<NullPlacement>
instead of NullPlacement, which did lead to some changes in downstream APIs and bindings. I also need some help with fixing thec_glib
bindings.