Skip to content

WISH: Less aggressive parallelization by default (please don't use *all* CPU cores) #333

@HenrikBengtsson

Description

@HenrikBengtsson

Hi, I noticed text2vec runs on all CPU cores by default on Unix. This is from:

text2vec/R/zzz.R

Lines 6 to 9 in 9ddf836

n_cores = 1L
if(.Platform$OS.type == "unix")
n_cores = parallel::detectCores(logical = FALSE)
options("text2vec.mc.cores" = n_cores)

text2vec/R/mc_queue.R

Lines 1 to 4 in 9ddf836

mc_queue = function(x,
FUN = identity,
mc.cores = getOption("text2vec.mc.cores", parallel::detectCores(logical = FALSE)),
poll_sleep = 0.01) {

Defaulting to all cores causes major problems on machines used by multiple users, but also when there are software tools running at the same time. I spotted this on a 128 CPU core machine. Imagine running another 10-20 processes like that at the same time on this machine - it'll quickly come to a halt, which is a real problem.

Although the behavior can be changed by setting an R option, many users are not aware of the problem ... until the sysadms yell at them. Also, text2vec might be running deep down as a dependency that other package maintainers might not be aware of, so this behavior might be inherited also be other packages without them knowing.

Could you please consider switch the default to be more conservatively. Personally, I'm in the camp that everything should run sequentially (single-core), unless the user configures it otherwise. CRAN has a limit of two CPU cores.

(Disclaimer: I'm the author) If you don't want to do this, could you please consider changing from:

parallel::detectCores(logical = FALSE)

to

parallelly::availableCores(logical = FALSE)

because the latter gives sysadms a chance to limit it on their end, and it also respects CGroups settings, job scheduler allocations, etc. Please see https://parallelly.futureverse.org/#availablecores-vs-paralleldetectcores for more details.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions