-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve comments and change PrimitiveArray::from_trusted_len_iter
to take an ExactSizeIterator
#8564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 |
PrimitiveArray::from_trusted_len_iter
to take an ExactSizeIterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tobixdev -- I also kicked off some benchmarks
where | ||
P: std::borrow::Borrow<Option<<T as ArrowPrimitiveType>::Native>>, | ||
I: IntoIterator<Item = P>, | ||
I: ExactSizeIterator<Item = P>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the rationale for this change? I am thinking that maybe someone who had an iterator that knew its length but did not implement ExactSizeIterator
would have to change their code
Maybe that is ok, but the implementation doesn't even seem to use any of the methods on ExactSizeIterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we just mentioned that as a side note in the last PR. I think it has two advantages:
- Users have at least some help from the type system to avoid safety issues.
- It's consistent with BooleanArray
However, I am also happy with the old API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that some of the functions could make use of len instead of size_hint. If we choose to keep ExactSizeIterator
I'll go through them.
I think another cause of incompatibility is caused by users needing to call .into_iter() manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrobbel Do you have any more thoughts on which iterator bound we should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use TrustedLen
, but we can't because it's nightly-only. So the next best things is ExactSizeIterator
for iterators that know their exact length. However there is a note in the docs:
Note that this trait is a safe trait and as such does not and cannot guarantee that the returned length is correct. This means that unsafe code must not rely on the correctness of Iterator::size_hint. The unstable and unsafe TrustedLen trait gives this additional guarantee.
Since this method is marked unsafe
we require users to make sure their iterators are TrustedLen
. This means:
The iterator reports a size hint where it is either exact (lower bound is equal to upper bound), or the upper bound is None. The upper bound must only be None if the actual iterator length is larger than usize::MAX. In that case, the lower bound must be usize::MAX, resulting in an Iterator::size_hint() of (usize::MAX, None).
The motivation to use ExactSizeIterator
here (I think) is to use ExactSizeIterator::len
instead of Iterator::size_hint
. This has the same "can't be None
" requirement on the upper bound reported by the size hint.
For the bound we can use I: IntoIterator<Item = P, IntoIter: ExactSizeIterator>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖: Benchmark completed Details
|
Which issue does this PR close?
This is a follow-up to #8543 . There we discussed two outstanding issues that this PR tries to address.
Rationale for this change
Address the points from the PR.
What changes are included in this PR?
Vec
-specific optimizations)ExactSizeIterator
for thePrimitiveArray
Are these changes tested?
Existing test for
PrimitiveArray
Are there any user-facing changes?
Yes,
PrimitiveArray::from_trusted_len_iter
is more restrictive.