-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bitwise and operation in the kernel #2703
Conversation
T: ArrowNumericType, | ||
F: Fn(T::Native, T::Native) -> T::Native, | ||
{ | ||
if left.len() != right.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also check the length in binary
, so I am afraid the checking here is somewhat redundant.
Maybe we could let binary
to return Err when the 2 arrays have different length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine, the compiler should eliminate it, and even if it doesn't the cost will be amortised over the much more expensive code below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also check the length in
binary
, so I am afraid the checking here is somewhat redundant.Maybe we could let
binary
to return Err when the 2 arrays have different length.
If the binary return the ERR
with checking the len, I can remove this.
"Cannot perform bitwise operation on arrays of different length".to_string(), | ||
)); | ||
} | ||
Ok(binary(left, right, op)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we directly operate on the value buffer,
as the bitwise operation is irrelevant to the data type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to take into account both endianess, and the size of the types, but theoretically yes you could do this. I suspect it would perform worse, however, as you'll have a variable length type now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we directly operate on the value buffer, as the bitwise operation is irrelevant to the data type ?
@HaoYang670 we can track a issue for this optimization, and can try it after finish the bitwise kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @liukun4515. And we should add some benchmarks to see whether we ccoul get performance improvement by doing this.
This is great -- thank you @liukun4515 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor test nit
arrow/src/compute/kernels/bitwise.rs
Outdated
// unsigned value | ||
let left = UInt64Array::from(vec![Some(1), Some(2), None, Some(4)]); | ||
let scalar = 7; | ||
let expected = UInt64Array::from(vec![Some(1), Some(2), None, Some(4)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test where the result is changed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
use std::ops::BitAnd; | ||
|
||
// The helper function for bitwise operation with two array | ||
fn bitwise_op<T, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between this function and math_op
(https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/arithmetic.rs#L60)?
I find the 2 have same types and function body. Could we directly reuse math_op
, such as
// The helper function for bitwise operation with two array
fn bitwise_op<T, F>(
left: &PrimitiveArray<T>,
right: &PrimitiveArray<T>,
op: F,
) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
F: Fn(T::Native, T::Native) -> T::Native,
{
math_op(left, right, op)
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two difference about the math_op
- in the bit wise, the left and right has the same data type of
T
, and the output data type is alsoT
- Minor diff is the Error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the math_op
to replace the code body, but I concern the error message which can point out what operation is error
Benchmark runs are scheduled for baseline = 0ba5c5b and contender = 259a302. 259a302 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
part of #2702
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?