You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I sometimes (actually often) get signal 11 segmentation faults while using DataFrames in multithreaded Julia sessions. I managed to create a MWE:
Consider this script:
using DataFrames
using PyCall
import Pkg
Pkg.status()
println("threads: ", Threads.nthreads())
df =DataFrame(col1 =rand(1000), col2 =rand(1:2, 1000))
# col1 random numbers# col2 random 1 or 2println("\nv1:")
o =combine(df, :col1=> sum =>:sum_col1)
println(o)
println("\nv2:")
o =combine(groupby(df, :col2), :col1=> sum =>:sum_col1)
println(o)
# same thing using PyCallprintln("\nv3:")
o =combine(df, :col1=> (r ->py"sum"(r)) =>:sum_col1)
println(o)
println("\nv4:")
o =combine(groupby(df, :col2), :col1=> (r ->py"sum"(r)) =>:sum_col1)
println(o)
The thing is that this script runs smoothly non-GroupedDataFrames (see "v3") and for low numbers of threads, i.e. <= 4 on my MacBook and <= 13 on my linux box. However once I use more threads I consistently get segmentation faults as shown here:
> julia --proj -t 16 script.jl
(pwd(), Base.active_project(), gethostname()) = ("/home/arndt/tmp/GDF-bug", "/home/arndt/tmp/GDF-bug/Project.toml", "tardis")
Status `~/tmp/GDF-bug/Project.toml`
[a93c6f00] DataFrames v1.7.0
[438e738f] PyCall v1.96.4
threads: 16
v1:
1×1 DataFrame
Row │ sum_col1
│ Float64
─────┼──────────
1 │ 498.327
v2:
2×2 DataFrame
Row │ col2 sum_col1
│ Int64 Float64
─────┼─────────────────
1 │ 1 259.221
2 │ 2 239.106
v3:
1×1 DataFrame
Row │ sum_col1
│ Float64
─────┼──────────
1 │ 498.327
v4:
[16274] signal 11 (1): Segmentation fault
in expression starting at /home/arndt/tmp/GDF-bug/script.jl:35
_PyErr_GetRaisedException at /usr/local/src/conda/python-3.12.2/Python/errors.c:490 [inlined]
PyErr_GetRaisedException at /usr/local/src/conda/python-3.12.2/Python/errors.c:499
PyObject_ClearWeakRefs at /usr/local/src/conda/python-3.12.2/Objects/weakrefobject.c:959
array_dealloc at /home/arndt/.julia/conda/3/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so (unknown line)
pydecref_ at /home/arndt/.julia/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /home/arndt/.julia/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4569 at /home/arndt/.julia/compiled/v1.11/PyCall/GkzkC_jDbxf.so (unknown line)
run_finalizer at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gc.c:303
jl_gc_run_finalizers_in_list at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gc.c:393
run_finalizers at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gc.c:439
jl_mutex_unlock at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia_locks.h:80 [inlined]
jl_generate_fptr_impl at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jitlayers.cpp:545
jl_compile_method_internal at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gf.c:2536 [inlined]
jl_compile_method_internal at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gf.c:2423
_jl_invoke at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gf.c:2940 [inlined]
ijl_apply_generic at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/gf.c:3125
_combine_process_pair at /home/arndt/.julia/packages/DataFrames/kcA9R/src/groupeddataframe/splitapplycombine.jl:630
#781 at /home/arndt/.julia/packages/DataFrames/kcA9R/src/groupeddataframe/splitapplycombine.jl:742
unknown function(ip: 0x7f6592b3c1af)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 8373914 (Pool: 8372179; Big: 1735); GC: 9
Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered:
It seems to me to be a bug in PyCall.jl not in DataFrames.jl.
If I understand your example correctly the Julia sum works OK, but py"sum" errors. Right?
To be sure could you also please check the following code:
I agree that the involvement of PyCall looks strange. But I still believe that it has something to do with GroupedDataFrame, since the un-grouped command runs.
I sometimes (actually often) get signal 11 segmentation faults while using DataFrames in multithreaded Julia sessions. I managed to create a MWE:
Consider this script:
The thing is that this script runs smoothly non-GroupedDataFrames (see "v3") and for low numbers of threads, i.e. <= 4 on my MacBook and <= 13 on my linux box. However once I use more threads I consistently get segmentation faults as shown here:
The text was updated successfully, but these errors were encountered: