You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This information can be misleading in certain environments, particularly Linux HPC environments and other shared systems, where a user often does not have access to the full amount of physical resources.
For example: On our local HPC systems, we use Linux cgroups to enforce resource utilization limits on user jobs, including CPU and RAM. Regardless of how much memory or how many CPU cores I launch my job with, vsearch still reports the full amount of physical resources on the box.
As an extreme case, working in a shell session with a cgroup enforced memory limit of 50 MB and 1 CPU core, vsearch --fastq_mergepairs still reports that I have 500+ GB of RAM and 128 CPU cores to work with, before being promptly killed by the OS for exceeding the 50 MB memory limit.
I'm not sure what the best way to address it would be. For CPU cores, if running on Linux, you could maybe do a quick check for the OMP_NUM_THREADS variable, which many programs defer to in order to determine the amount of CPU cores available. But the presence of that variable is not always a guarantee. I do not know of a corresponding variable for memory, so presumably, you would have to poll the cgroup infrastructure in the kernel to see what limits are being enforced, and then modify the output accordingly, and I have no idea how difficult that would be to implement in arch.cc.
In any case, the main point is, I think it's misleading to assume (and tell the user) that vsearch has access to the entire machine when that may not always be the case.
The text was updated successfully, but these errors were encountered:
You are right, the CPU/mem information provided by vsearch can be misleading. Until we find a way to collect and report more accurate information, do you think a subtext would help? Maybe the following?
(note: resources actually available to vsearch can be much lower)
or
(note: resources allotted to vsearch by the operating system can be much lower)
Thank you for the quick response. A disclaimer might be good to add as a temporary step. Of the two options you listed, I think the second one is clearer.
Related to my comments about cgroups, I see that you have a Dockerfile in the repo. People frequently use cgroups to limit the amount of resources available to a Docker container, so if there is appetite for making the "resources available" line more accurate, I think the cgroups infrastructure would be a good place to target.
I would be curious though to see what vsearch detects for resources inside a resource-limited Docker container as it's written now. Maybe that's something I can test...
On startup,
vsearch
reports the total amount of physical RAM and CPU cores available on a system.This information can be misleading in certain environments, particularly Linux HPC environments and other shared systems, where a user often does not have access to the full amount of physical resources.
For example: On our local HPC systems, we use Linux
cgroups
to enforce resource utilization limits on user jobs, including CPU and RAM. Regardless of how much memory or how many CPU cores I launch my job with,vsearch
still reports the full amount of physical resources on the box.As an extreme case, working in a shell session with a
cgroup
enforced memory limit of 50 MB and 1 CPU core,vsearch --fastq_mergepairs
still reports that I have 500+ GB of RAM and 128 CPU cores to work with, before being promptly killed by the OS for exceeding the 50 MB memory limit.I'm not sure what the best way to address it would be. For CPU cores, if running on Linux, you could maybe do a quick check for the
OMP_NUM_THREADS
variable, which many programs defer to in order to determine the amount of CPU cores available. But the presence of that variable is not always a guarantee. I do not know of a corresponding variable for memory, so presumably, you would have to poll thecgroup
infrastructure in the kernel to see what limits are being enforced, and then modify the output accordingly, and I have no idea how difficult that would be to implement inarch.cc
.In any case, the main point is, I think it's misleading to assume (and tell the user) that
vsearch
has access to the entire machine when that may not always be the case.The text was updated successfully, but these errors were encountered: