-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Hi, I noticed text2vec runs on all CPU cores by default on Unix. This is from:
Lines 6 to 9 in 9ddf836
n_cores = 1L | |
if(.Platform$OS.type == "unix") | |
n_cores = parallel::detectCores(logical = FALSE) | |
options("text2vec.mc.cores" = n_cores) |
Lines 1 to 4 in 9ddf836
mc_queue = function(x, | |
FUN = identity, | |
mc.cores = getOption("text2vec.mc.cores", parallel::detectCores(logical = FALSE)), | |
poll_sleep = 0.01) { |
Defaulting to all cores causes major problems on machines used by multiple users, but also when there are software tools running at the same time. I spotted this on a 128 CPU core machine. Imagine running another 10-20 processes like that at the same time on this machine - it'll quickly come to a halt, which is a real problem.
Although the behavior can be changed by setting an R option, many users are not aware of the problem ... until the sysadms yell at them. Also, text2vec might be running deep down as a dependency that other package maintainers might not be aware of, so this behavior might be inherited also be other packages without them knowing.
Could you please consider switch the default to be more conservatively. Personally, I'm in the camp that everything should run sequentially (single-core), unless the user configures it otherwise. CRAN has a limit of two CPU cores.
(Disclaimer: I'm the author) If you don't want to do this, could you please consider changing from:
parallel::detectCores(logical = FALSE)
to
parallelly::availableCores(logical = FALSE)
because the latter gives sysadms a chance to limit it on their end, and it also respects CGroups settings, job scheduler allocations, etc. Please see https://parallelly.futureverse.org/#availablecores-vs-paralleldetectcores for more details.
Thank you