Skip to content

threaded segfault in setindex! with arrays of compound-typed elements #1179

@lxvm

Description

@lxvm

Hello,

@phibeck reported this bug to me and I have made a mwe of a segfault. setindex! on a dataset with an array of a compound datatype appears to be thread unsafe.

This is the mwe

using HDF5

T = ComplexF64
# T = Float64 # OK
sizes = (100, 100, 100)
Threads.@threads for i in 1:Threads.nthreads()
    h5open("file$i.hdf5", "w") do h5
        d = create_dataset(h5, "rand", T, sizes)
        for n in 1:sizes[3]
            d[:,:,n] = rand(T, sizes[1:2])
        end
    end
end

This is the segfault, although the message is different every time

$ julia --project --threads 6 mwe.jl

[400950] signal 7 (128): Bus error
in expression starting at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:6
H5FL_reg_malloc at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5FL_reg_calloc at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5CX_push at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5Sget_simple_extent_type at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
h5s_get_simple_extent_type at /home/lxvm/.julia/packages/HDF5/Z859u/src/api/functions.jl:6716
setindex! at /home/lxvm/.julia/packages/HDF5/Z859u/src/datasets.jl:368
unknown function (ip: 0x7f6496319060)
#2 at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:10
#17 at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:101
task_local_storage at ./task.jl:315
#h5open#16 at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:96 [inlined]
h5open at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:94 [inlined]
macro expansion at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:7 [inlined]
#26#threadsfor_fun#1 at ./threadingconstructs.jl:252
#26#threadsfor_fun at ./threadingconstructs.jl:219 [inlined]
#1 at ./threadingconstructs.jl:154
unknown function (ip: 0x7f649630ba9f)
jl_apply at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-6/julialang/julia-master/src/task.c:1202
Allocations: 4069986 (Pool: 4069810; Big: 176); GC: 6
Bus error

In @phibeck's bug report the segfault showed a trace through setindex! -> get_jl_type -> get_mem_compatible_jl_type -> h5t_get_member_name and I noticed there was no lock on the library call in this function:

HDF5.jl/src/api/helpers.jl

Lines 982 to 989 in 1a24872

# Note: The following two functions implement direct ccalls because the binding generator
# cannot (yet) do the string wrapping and memory freeing.
"""
h5t_get_member_name(type_id::hid_t, index::Cuint) -> String
See `libhdf5` documentation for [`H5Oopen`](https://portal.hdfgroup.org/display/HDF5/H5T_GET_MEMBER_NAME).
"""
function h5t_get_member_name(type_id, index)

It seems that calling setindex! with an array of a compound datatype, such as ComplexF64, hits this codepath and consistently segfaults.
Although our code could be rewritten to avoid multi-threaded hdf5 calls, I am hoping to identify and fix the issue.

For reference, I ran this on linux and this is the environment:

julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.1 (2024-10-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(hdf5) pkg> st -m
Status `/local/home/lxvm/projects/autobz/hdf5/Manifest.toml`
  [34da2185] Compat v4.16.0
  [f67ccb44] HDF5 v0.17.2
  [692b3bcd] JLLWrappers v1.6.1
  [3da0fdf6] MPIPreferences v0.1.11
  [21216c6a] Preferences v1.4.3
  [ae029012] Requires v1.3.0
  [0234f1f7] HDF5_jll v1.14.3+3
  [e33a78d0] Hwloc_jll v2.11.2+1
  [7cb0a576] MPICH_jll v4.2.3+0
  [f1f71cc9] MPItrampoline_jll v5.5.1+0
  [9237b28f] MicrosoftMPI_jll v10.1.4+2
⌅ [fe0851c0] OpenMPI_jll v4.1.6+0
  [458c3c95] OpenSSL_jll v3.0.15+1
  [477f73a3] libaec_jll v1.1.2+0
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching v1.11.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.11.0
  [de0858da] Printf v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.6.0+0
  [e37daf67] LibGit2_jll v1.7.2+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.6+0
  [14a3606d] MozillaCACerts_jll v2023.12.12
  [83775a58] Zlib_jll v1.2.13+1
  [8e850ede] nghttp2_jll v1.59.0+0
  [3f19e933] p7zip_jll v17.4.0+2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions