GH-46708: [C++][Gandiva] Added zero return values for castDECIMAL_utf8#46709
GH-46708: [C++][Gandiva] Added zero return values for castDECIMAL_utf8#46709kou merged 4 commits intoapache:mainfrom
Conversation
|
|
…AL_utf8 if in value can not be parsed
|
|
||
| int num_records = 1; | ||
| auto invalid_in = MakeArrowArrayUtf8({"1.345"}, {true}); | ||
| auto in_batch_1 = arrow::RecordBatch::Make(schema, num_records, {invalid_in}); |
There was a problem hiding this comment.
| auto in_batch_1 = arrow::RecordBatch::Make(schema, num_records, {invalid_in}); | |
| ASSERT_OK_AND_ASSIGN(auto in_batch_1, arrow::RecordBatch::Make(schema, num_records, {invalid_in})); |
BTW, why do we need _1 suffix here? It seems that there is only one record batch.
There was a problem hiding this comment.
_1 was just copy paste, fixed.
arrow::RecordBatch::Make does not return Result<> so ASSERT_OK_AND_ASSIGN should not work, right?
| auto int_literal = TreeExprBuilder::MakeLiteral((int32_t)100); | ||
| auto int_literal_multiply = TreeExprBuilder::MakeLiteral((int32_t)10); | ||
| auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
| auto cast_multiply_literal = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
| auto cast_int_literal = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
| auto cast_string_func = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
| auto multiply_func = | ||
| TreeExprBuilder::MakeFunction("multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
| auto equal_func = TreeExprBuilder::MakeFunction("equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
| auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Does not really repro sigsegv without that complex expression
There was a problem hiding this comment.
I think that we don't need to reproduce SIGSEGV here.
Can we check only that out_high/out_low are set on error instead?
There was a problem hiding this comment.
Hmm. Let me try that
There was a problem hiding this comment.
It seems like if function failed to parse data, corresponding element will not be set in output array.
@kou can you suggest how do I test that castDecimal returned 0?
There was a problem hiding this comment.
Strange. Fix works on my machine. Also test was green in this PR and also tried in docker x86 env.
[ RUN ] TestDecimal.TestCastDecimalVarCharInvalidInputInvalidOutput
/arrow2/cpp/src/gandiva/cache.cc:58: Creating gandiva cache with capacity of 5000
/arrow2/cpp/src/gandiva/engine.cc:282: Detected CPU Name : znver2
/arrow2/cpp/src/gandiva/engine.cc:283: Detected CPU Features: [ +prfchw -cldemote +avx +aes +sahf +pclmul -xop +crc32 +xsaves -avx512fp16 -sm4 +sse4.1 -avx512ifma +xsave -avx512pf +sse4.2 -tsxldtrk -ptwrite -widekl -sm3 -invpcid +64bit +xsavec -avx512vpopcntdq +cmov -avx512vp2intersect -avx512cd +movbe -avxvnniint8 -avx512er -amx-int8 -kl -sha512 -avxvnni -rtm +adx +avx2 -hreset -movdiri -serialize -vpclmulqdq -avx512vl -uintr +clflushopt -raoint -cmpccxadd +bmi -amx-tile +sse -gfni -avxvnniint16 -amx-fp16 +xsaveopt +rdrnd -avx512f -amx-bf16 -avx512bf16 -avx512vnni +cx8 -avx512bw +sse3 -pku +fsgsbase +clzero -mwaitx -lwp +lzcnt +sha -movdir64b -wbnoinvd -enqcmd -prefetchwt1 -avxneconvert -tbm -pconfig -amx-complex +ssse3 +cx16 +bmi2 +fma +popcnt -avxifma +f16c -avx512bitalg -rdpru +clwb +mmx +sse2 +rdseed -avx512vbmi2 -prefetchi +rdpid -fma4 -avx512vbmi -shstk -vaes -waitpkg -sgx +fxsr -avx512dq +sse4a]
Thread 1 "gandiva-project" received signal SIGSEGV, Segmentation fault.
0x00007ffff786fdfc in ?? ()
(gdb) bt
#0 0x00007ffff786fdfc in ?? ()
#1 0x00007fffffffd700 in ?? ()
#2 0x0000000108b76b98 in ?? ()
#3 0x0000000c9f2c9cd0 in ?? ()
#4 0x4674edea40000000 in ?? ()
#5 0x4674edea40000000 in ?? ()
#6 0x0000000c9f2c9cd0 in ?? ()
#7 0x85acef8100000000 in ?? ()
#8 0x000004ee2d6d415b in ?? ()
#9 0x00007fffffffd6b0 in ?? ()
#10 0x0000000002064191 in std::_Tuple_impl<0ul, unsigned char**, std::default_delete<unsigned char* []> >::_M_head (__t=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/tuple:204
#11 0x000000000203d925 in gandiva::Projector::Evaluate (this=0x8d66a50, batch=..., selection_vector=0x0, pool=0x8768f40 <arrow::global_state+576>, output=0x7fffffffdaa0)
at /arrow2/cpp/src/gandiva/projector.cc:176
#12 0x000000000203d5e5 in gandiva::Projector::Evaluate (this=0x8d66a50, batch=..., pool=0x8768f40 <arrow::global_state+576>, output=0x7fffffffdaa0)
at /arrow2/cpp/src/gandiva/projector.cc:154
#13 0x0000000001f9ea88 in gandiva::TestDecimal_TestCastDecimalVarCharInvalidInputInvalidOutput_Test::TestBody (this=0x8aedb20)
at /arrow2/cpp/src/gandiva/tests/decimal_test.cc:1207
If you have repro even with my fix. Can you please share commands how do I repro it. What is the env?
There was a problem hiding this comment.
I have updated Issue with some additional information. #46708
There was a problem hiding this comment.
I could reproduce this crash and confirm that this fix this crash by #46765.
There was a problem hiding this comment.
Sorry your comment is not clear. Do you say this fix is fine? Will you approve it then?
There was a problem hiding this comment.
Sorry. I still don't understand why this case is crashed. But I'm OK with this change.
Fixed PR comments
|
@github-actions autotune |
fixed formatting
|
|
||
| auto status = Projector::Make(schema, {expr}, TestConfiguration(), &projector); | ||
| EXPECT_TRUE(status.ok()) << status.message(); |
There was a problem hiding this comment.
| auto status = Projector::Make(schema, {expr}, TestConfiguration(), &projector); | |
| EXPECT_TRUE(status.ok()) << status.message(); | |
| ASSERT_OK(Projector::Make(schema, {expr}, TestConfiguration(), &projector)); |
| auto int_literal = TreeExprBuilder::MakeLiteral((int32_t)100); | ||
| auto int_literal_multiply = TreeExprBuilder::MakeLiteral((int32_t)10); | ||
| auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
| auto cast_multiply_literal = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
| auto cast_int_literal = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
| auto cast_string_func = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
| auto multiply_func = | ||
| TreeExprBuilder::MakeFunction("multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
| auto equal_func = TreeExprBuilder::MakeFunction("equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
| auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Sorry. I still don't understand why this case is crashed. But I'm OK with this change.
| auto int_literal = TreeExprBuilder::MakeLiteral(static_cast<int32_t>(100)); | ||
| auto int_literal_multiply = TreeExprBuilder::MakeLiteral(static_cast<int32_t>(10)); | ||
| auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
| auto cast_multiply_literal = TreeExprBuilder::MakeFunction( | ||
| "castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
| auto cast_int_literal = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
| auto cast_string_func = | ||
| TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
| auto multiply_func = TreeExprBuilder::MakeFunction( | ||
| "multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
| auto equal_func = TreeExprBuilder::MakeFunction( | ||
| "equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
| auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Could you add a comment why we need to use this expression?
There was a problem hiding this comment.
Added comment in code.
equal(multiply(castDecimal(10), castDecimal(100)), castDECIMAL("foo"))
This is minimal expression I have found to be able to reproduce SIGSEGV. If I remove elements from it crash won't happen.
I would like to make short test with just castDECIMAL but in case of parse error I can't extract returned 0 values in test.
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c97e21a. There were 68 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
* Fix Jar build * apacheGH-46708: [C++][Gandiva] Added zero return values for castDECIMAL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]> --------- Signed-off-by: Sutou Kouhei <[email protected]> Co-authored-by: Tim Hurski <[email protected]> Co-authored-by: DenisTarasyuk <[email protected]>
* Fix Jar build * apacheGH-46708: [C++][Gandiva] Added zero return values for castDECIMAL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]> --------- Signed-off-by: Sutou Kouhei <[email protected]> Co-authored-by: Tim Hurski <[email protected]> Co-authored-by: DenisTarasyuk <[email protected]>
Rationale for this change
castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector.
What changes are included in this PR?
Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed.
Added corresponding test that reproduces SIGSEGV in projector.
Are these changes tested?
Yes
Are there any user-facing changes?
No