Skip to content

Commit

Permalink
Ensure job_dispatch exits cleanly on OSError
Browse files Browse the repository at this point in the history
If the reporter is for some reason unable to write its json files,
an OSError is raised and the job_dispatch.py process is stopped.
When job_dispatc.py is interrupted, it will never wait() its child,
which subsequently becomes a zombie process.

In this commit, we mark any OSError as a hard error from which we should
exit (as before) and ensure we bring down the child.
  • Loading branch information
berland committed Aug 25, 2023
1 parent e7bc23e commit a0e1259
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/_ert_job_runner/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,14 @@ def main(args):
for job_status in job_runner.run(parsed_args.job):
logger.info(f"Job status: {job_status}")
for reporter in reporters:
reporter.report(job_status)
try:
reporter.report(job_status)
except OSError as oserror:
print(
f"job_dispatch failed due to {oserror}. Stopping and cleaning up."
)
pgid = os.getpgid(os.getpid())
os.killpg(pgid, signal.SIGKILL)

if isinstance(job_status, Finish) and not job_status.success():
pgid = os.getpgid(os.getpid())
Expand Down

0 comments on commit a0e1259

Please sign in to comment.