Skip to content

DataLoaders(..., parallel=true) hanging #132

Closed
@ablaom

Description

@ablaom

In the following MWE I successively create an out-of-memory data source of 20 MNIST images using FileDataset. I can the wrap the source as MLUtils.DataLoader with the default parallel=false option and collect the result. However, if I specify parallel=true then the collect hangs.

Pkg.activate("data", shared=true)
import MLDatasets: MNIST
using MLDatasets
using ScientificTypes
using MLUtils
using FileIO

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
images, labels = MNIST.(split=:train)[:];

N = 20
images = coerce(images, GrayImage)[1:N];

# save some MNIST images as tiff files:
const dir = tempname()
for i  in eachindex(images)
    filename = joinpath(dir, "$i.tiff")
    FileIO.save(filename, images[i])
end

# create out-of-memory image source:
X = MLDatasets.FileDataset(dir)

sequential = DataLoader(X, batchsize=2, collate=true)
collect(sequential) # executes as expected

parallel = DataLoader(X, batchsize=2, collate=true, parallel=true);
collect(parallel); # hangs

Here's my setup:

(@data) pkg> status
Status `~/.julia/environments/data/Project.toml`
  [5789e2e9] FileIO v1.16.0
  [82e4d734] ImageIO v0.6.6
  [eb30cadb] MLDatasets v0.7.6
  [f1d291b0] MLUtils v0.3.1
  [321657f4] ScientificTypes v3.0.2

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 5 on 12 virtual cores
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.8.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 5
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.8.app/Contents/Resources/julia/bin/julia

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions