-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43605: [Go][Parquet] unpack32Avx2 returns 0 if its buffer is empty #43616
Conversation
if buffer.Len() == 0 { | ||
return 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to shift this above line 42 and have it be:
if n == 0 {
return 0
}
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the dataset I have it makes sense.
I don't have enough experience with arrow to know if n!=0 && buffer.Len()==0
is possible. In this case, the program would crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the way the code is written, buffer.Len()
should always end up being equal to n
because we call buffer.Reset()
and buffer.Grow(n)
so I think we should make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right. I moved the condition directly on nbits
. Since batch
is already checked, it covers all cases where n
is equal to 0. Is it OK for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're dealing with integer division, that doesn't cover all cases. To cover all cases you'd need something like if batch * nbits < 8
since integer division would result in n == 0
for any scenario where batch * nbits
is less than 8 (since that would result in 0 when we divide by 8). So it might just be better to just check if n == 0
rather than try to find all the cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I moved the condition on n
.
4d97086
to
7ba9479
Compare
7ba9479
to
42c5e45
Compare
42c5e45
to
1af4ee9
Compare
1af4ee9
to
7ef5083
Compare
7ef5083
to
13bcd8d
Compare
13bcd8d
to
07de915
Compare
Rationale for this change
In case a uint16 column of a parquet file is fully filled with 0, it may happen in some cases that arrow fails to open the file.
Issue is described in GH-43605 and related PR: #43607
I created a new PR to be able to differenciate the 2 changes:
What changes are included in this PR?
In case the buffer of
unpack32Avx2
is empty, this function returns 0.Are these changes tested?
Yes, I need this branch to be merged to be able to run the test in your CI/CD:
apache/parquet-testing#57
Are there any user-facing changes?
I would say the API is not broken and no changes are expected from users, plus unit tests are passing on my setup with
./ci/scripts/go_test.sh
. I would like to see the results of the jobs in your GH workflows.This PR contains a "Critical Fix".