-
Notifications
You must be signed in to change notification settings - Fork 24
PBS job manager #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
PBS job manager #151
Changes from all commits
2b5efb9
a31fcc7
b4cf0cc
835e732
3b30b1b
f641a1d
c59feab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| #!/usr/bin/env python | ||
| # | ||
| # Copyright (c) 2017 10X Genomics, Inc. All rights reserved. | ||
| # | ||
|
|
||
| """Queries qstat about a list of jobs and parses the output, returning the list | ||
| of jobs which are queued, running, or on hold.""" | ||
|
|
||
| import subprocess | ||
| import sys | ||
| import json | ||
|
|
||
| # PBS Pro "job states" to be regarded as "alive" | ||
| ALIVE = "QHWSRE" | ||
|
|
||
|
|
||
| def get_ids(): | ||
| """Returns the set of jobids to query from standard input.""" | ||
| ids = [] | ||
| for jobid in sys.stdin: | ||
| ids.extend(jobid.split()) | ||
| return ids | ||
|
|
||
|
|
||
| def mkopts(ids): | ||
| """Gets the command line for qstat.""" | ||
| if not ids: | ||
| sys.exit(0) | ||
| return ["qstat", "-x", "-F", "json", "-f"] + ids | ||
|
|
||
|
|
||
| def execute(cmd): | ||
| """Executes qstat and captures its output.""" | ||
| with subprocess.Popen( | ||
| cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE | ||
| ) as proc: | ||
| out, err = proc.communicate() | ||
| if proc.returncode: | ||
| raise OSError(err) | ||
| if not isinstance(out, str): | ||
| out = out.decode() | ||
| if len(out) < 500: | ||
| sys.stderr.write(out) | ||
| else: | ||
| sys.stderr.write(out[:496] + "...") | ||
| return out | ||
|
|
||
|
|
||
| def parse_output(out): | ||
| """Parses the JSON-format output of qstat and yields the ids of pending | ||
| jobs.""" | ||
| data = json.loads(out) | ||
| for jid, info in data.get("Jobs", {}).items(): | ||
| if info.get("job_state") in ALIVE: | ||
| yield jid | ||
|
|
||
|
|
||
| def main(): | ||
| """Reads a set of ids from standard input, queries qstat, and outputs the | ||
| jobids to standard output for jobs which are in the pending state.""" | ||
| for jobid in parse_output(execute(mkopts(get_ids()))): | ||
| sys.stdout.write(f"{jobid}\n") | ||
| return 0 | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(main()) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,9 +6,12 @@ | |
| # Setup Instructions | ||
| # ============================================================================= | ||
| # | ||
| # 1. Add any other necessary PBSpro arguments such as queue (-q) or account | ||
| # (-A). If your system requires a walltime (-l walltime), 24 hours (24:00:00) | ||
| # is sufficient. We recommend you do not remove any arguments below or | ||
| # 1. Add any other necessary PBSpro arguments such as queue (-q), account | ||
| # (-A), project (-P) or volumes (-l storage=). | ||
| # If your system requires a walltime (-l walltime), 24 hours (24:00:00) | ||
| # is sufficient, but this can often be reduced to 4 hours or less if | ||
| # '--maxjobs' is at least 10. | ||
|
Comment on lines
+12
to
+13
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While it's probably fine to set 4 hours regardless, this comment is misleading. The wall time being set in this template is for each individual job, not for the overall pipeline run.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't seen any jobs run for more than an hour in cluster mode, but happy to be educated here. My naive assumption was that In our system, PBS Pro dumps actual usage information to the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I have not previously encountered a cluster that charged based on reserved wall time as opposed to actually-used wall time; if you have that then more careful accounting is certainly in order! You can get the actual wall time (as measured by the job itself, so might be a slight underestimate as far as the cluster is concerned) from the In our internal infrastructure, we usually run on spot instances in AWS, which are prone to involuntary preemption, so we try to aim to keep individual jobs under 1 hour (for most datasets anyway) to avoid loosing too much work when one of them fails this way and needs to be restarted. I think in most cases there will be very few jobs more than 15 minutes, even, depending on the performance of your hardware (the slow stages tend to be mostly I/O limited).
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the clarification. We only get billed for actual usage, but potential usage gets "reserved" until the jobs finish. 1000 jobs (from multiple parallel runs) all claiming 24 hours walltime can result in all our quota getting reserved, which prevents new jobs from queuing, breaking the pipeline orchestrating the parallel runs. |
||
| # We recommend you do not remove any arguments below or | ||
| # Martian may not run properly. | ||
| # | ||
| # 2. Change filename of pbspro.template.example to pbspro.template. | ||
|
|
@@ -18,6 +21,7 @@ | |
| # ============================================================================= | ||
| # | ||
| #PBS -N __MRO_JOB_NAME__ | ||
| #PBS -S /bin/bash | ||
| #PBS -V | ||
| #PBS -l select=1:ncpus=__MRO_THREADS__ | ||
| #PBS -l mem=__MRO_MEM_GB__gb | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.